conduktor.io ↗

Production Hardening

Starter plus 15 extra rules for clusters carrying real traffic.

Why this bundle

Defaults for clusters carrying real traffic.

Twenty-five policies: the Starter pack plus fifteen extra rules for ISR floors, schema enforcement, ACL hygiene, connector parallelism caps, and DLQ wiring. The rules most teams add only after their first incident.

Apply the whole bundle

One concatenated YAML stream with every ResourcePolicy in this bundle. Copy, save, apply.

# All policies in the Production Hardening bundle (25 resources)
# Save as bundle-production-hardening.yaml then: conduktor apply -f bundle-production-hardening.yaml
# Each ResourcePolicy must still be linked via Application(Instance).spec.policyRef
# or KafkaCluster.spec.policiesRef to take effect.
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: topic-name-convention
spec:
  targetKind: Topic
  description: Topic names must follow <env>.<domain>.<entity>.<version>
  rules:
    - condition: metadata.name.matches("^(dev|staging|prod)\\.[a-z0-9-]+\\.[a-z0-9-]+\\.v[0-9]+$")
      errorMessage: "Topic name must match <env>.<domain>.<entity>.<version> (e.g. prod.orders.placed.v1)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: consumer-group-prefix
spec:
  targetKind: ApplicationGroup
  description: Consumer-group / ApplicationGroup names must be <team>.<app>.cg
  rules:
    - condition: metadata.name.matches("^[a-z][a-z0-9-]+\\.[a-z0-9-]+\\.cg$")
      errorMessage: "Consumer group must be <team>.<app>.cg (e.g. orders.fraud-detector.cg)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: partition-count-bounds
spec:
  targetKind: Topic
  description: Topics must have between 1 and 200 partitions
  rules:
    - condition: spec.partitions >= 1 && spec.partitions <= 200
      errorMessage: "Partitions must be between 1 and 200 (request an override for outliers)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: min-replication-factor
spec:
  targetKind: Topic
  description: Production topics must have replication factor >= 3
  rules:
    - condition: '!metadata.name.startsWith("prod.") || spec.replicationFactor >= 3'
      errorMessage: "Production topics must have replication factor >= 3"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: isr-alignment
spec:
  targetKind: Topic
  description: min.insync.replicas must equal replicationFactor - 1
  rules:
    - condition: '"min.insync.replicas" in spec.configs && int(string(spec.configs["min.insync.replicas"])) == int(spec.replicationFactor) - 1'
      errorMessage: "min.insync.replicas must equal replicationFactor - 1 (e.g. RF=3 -> min.insync.replicas=2)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: max-retention-bound
spec:
  targetKind: Topic
  description: retention.ms must be <= 30d and not infinite, unless labels.retention-justified == "true"
  rules:
    - condition: '("retention-justified" in metadata.labels && metadata.labels["retention-justified"] == "true") || ("retention.ms" in spec.configs && int(string(spec.configs["retention.ms"])) != -1 && int(string(spec.configs["retention.ms"])) <= 2592000000)'
      errorMessage: "retention.ms must be <= 30 days and not -1 (set label retention-justified=true to override)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: compression-allowlist
spec:
  targetKind: Topic
  description: compression.type must be lz4 or zstd
  rules:
    - condition: '"compression.type" in spec.configs && spec.configs["compression.type"] in ["lz4", "zstd"]'
      errorMessage: "compression.type must be lz4 or zstd"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: schema-required-non-internal
spec:
  targetKind: Topic
  description: Non-internal topics must declare a schema subject via labels.schema-subject
  rules:
    - condition: 'metadata.name.startsWith("__") || metadata.name.startsWith("_") || ("schema-subject" in metadata.labels && size(metadata.labels["schema-subject"]) > 0)'
      errorMessage: "Non-internal topics must set label schema-subject=<subject-name> (or be Schema-Registry-governed)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: no-wildcard-acl-prod
spec:
  targetKind: ApplicationGroup
  description: ApplicationGroups touching prod.* resources cannot use wildcard LITERAL resource patterns
  rules:
    - condition: 'spec.permissions.all(p, !(p.name.startsWith("prod.") || p.name == "*") || (p.patternType != "LITERAL" || p.name != "*"))'
      errorMessage: "Wildcard LITERAL resource is not allowed on production resources"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: max-message-bytes-bound
spec:
  targetKind: Topic
  description: max.message.bytes must be <= 10 MB
  rules:
    - condition: '!("max.message.bytes" in spec.configs) || int(string(spec.configs["max.message.bytes"])) <= 10485760'
      errorMessage: "max.message.bytes > 10 MB — consider claim-check pattern with S3"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: min-insync-replicas-bound
spec:
  targetKind: Topic
  description: min.insync.replicas must be >= 2 and <= replicationFactor - 1
  rules:
    - condition: '"min.insync.replicas" in spec.configs && int(string(spec.configs["min.insync.replicas"])) >= 2 && int(string(spec.configs["min.insync.replicas"])) <= int(spec.replicationFactor) - 1'
      errorMessage: "min.insync.replicas must be >= 2 and <= replicationFactor - 1 (durability vs availability tradeoff)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: partition-count-power-aligned
spec:
  targetKind: Topic
  description: partitions must be one of [1, 3, 6, 12, 24, 48, 96]
  rules:
    - condition: 'spec.partitions in [1, 3, 6, 12, 24, 48, 96]'
      errorMessage: "partitions must be one of [1, 3, 6, 12, 24, 48, 96] — odd or prime counts cause permanent consumer-group skew"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: retention-bytes-required
spec:
  targetKind: Topic
  description: delete-policy topics must declare retention.bytes > 0
  rules:
    - condition: '!("cleanup.policy" in spec.configs) || !spec.configs["cleanup.policy"].contains("delete") || ("retention.bytes" in spec.configs && int(string(spec.configs["retention.bytes"])) > 0)'
      errorMessage: "retention.bytes must be set (>0) on delete-policy topics — time-only retention is unbounded in disk"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: compact-topic-requires-tombstone-retention
spec:
  targetKind: Topic
  description: compacted topics must declare delete.retention.ms in [1h, 30d]
  rules:
    - condition: '!("cleanup.policy" in spec.configs) || !spec.configs["cleanup.policy"].contains("compact") || ("delete.retention.ms" in spec.configs && int(string(spec.configs["delete.retention.ms"])) >= 3600000 && int(string(spec.configs["delete.retention.ms"])) <= 2592000000)'
      errorMessage: "compacted topics must declare delete.retention.ms in [1h, 30d] — default 24h often breaks slow consumers"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: segment-bytes-sane-bound
spec:
  targetKind: Topic
  description: segment.bytes must be in [64 MiB, 2 GiB]
  rules:
    - condition: '!("segment.bytes" in spec.configs) || (int(string(spec.configs["segment.bytes"])) >= 67108864 && int(string(spec.configs["segment.bytes"])) <= 2147483648)'
      errorMessage: "segment.bytes must be between 64MiB and 2GiB — tiny segments blow up file count, huge segments delay retention"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: topic-owner-label-required
spec:
  targetKind: Topic
  description: every topic must carry owner (email) and data-criticality (C0..C3) labels
  rules:
    - condition: 'has(metadata.labels.owner) && metadata.labels.owner.matches("^[a-z0-9._-]+@[a-z0-9.-]+\\.[a-z]{2,}$") && "data-criticality" in metadata.labels && metadata.labels["data-criticality"] in ["C0", "C1", "C2", "C3"]'
      errorMessage: "metadata.labels.owner (email) and metadata.labels[\"data-criticality\"] (C0..C3) are required — unowned topics become stale and unpageable"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: topic-domain-prefix-matches-app
spec:
  targetKind: Topic
  description: topic name must start with "<domain>." matching metadata.labels.domain
  rules:
    - condition: 'has(metadata.labels.domain) && metadata.name.startsWith(metadata.labels.domain + ".")'
      errorMessage: "topic name must start with \"<domain>.\" matching metadata.labels.domain (keeps naming, ACL prefix and chargeback in sync)"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: connector-class-allowlist
spec:
  targetKind: Connector
  description: connector.class must be on the vetted allowlist
  rules:
    - condition: |
        spec.config["connector.class"] in [
          "io.confluent.connect.jdbc.JdbcSinkConnector",
          "io.confluent.connect.jdbc.JdbcSourceConnector",
          "io.debezium.connector.postgresql.PostgresConnector",
          "io.debezium.connector.mysql.MySqlConnector",
          "io.confluent.connect.s3.S3SinkConnector",
          "org.apache.kafka.connect.mirror.MirrorSourceConnector",
          "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector"
        ]
      errorMessage: "connector.class is not on the vetted allowlist — ask the platform team to certify the plugin first"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: connector-tasks-max-bound
spec:
  targetKind: Connector
  description: tasks.max must be in [1, 16]
  rules:
    - condition: '"tasks.max" in spec.config && int(string(spec.config["tasks.max"])) >= 1 && int(string(spec.config["tasks.max"])) <= 16'
      errorMessage: "tasks.max must be in [1, 16] — higher values starve other tenants on the Connect cluster"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: connector-error-tolerance-bounded
spec:
  targetKind: Connector
  description: errors.tolerance=all requires errors.deadletterqueue.topic.name to be set
  rules:
    - condition: '!("errors.tolerance" in spec.config) || spec.config["errors.tolerance"] != "all" || ("errors.deadletterqueue.topic.name" in spec.config && spec.config["errors.deadletterqueue.topic.name"] != "")'
      errorMessage: "errors.tolerance=all requires errors.deadletterqueue.topic.name — otherwise poison records are silently dropped"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: subject-naming-strategy-topic-value
spec:
  targetKind: Subject
  description: subject name must end with -key or -value (TopicNameStrategy)
  rules:
    - condition: 'metadata.name.endsWith("-key") || metadata.name.endsWith("-value")'
      errorMessage: "subject name must end with \"-key\" or \"-value\" (TopicNameStrategy) — other suffixes break auto-resolution in standard SR clients"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: subject-compatibility-not-none
spec:
  targetKind: Subject
  description: spec.compatibility must be set and not NONE
  rules:
    - condition: 'has(spec.compatibility) && spec.compatibility in ["BACKWARD", "BACKWARD_TRANSITIVE", "FORWARD", "FORWARD_TRANSITIVE", "FULL", "FULL_TRANSITIVE"]'
      errorMessage: "spec.compatibility must be explicitly set and not NONE — NONE allows arbitrary schema changes that break downstream consumers"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: applicationgroup-no-wildcard-write
spec:
  targetKind: ApplicationGroup
  description: wildcard (name="*") permissions cannot include write/create/delete on topics or connectors
  rules:
    - condition: 'spec.permissions.all(p, p.name != "*" || !p.permissions.exists(perm, perm in ["topicProduce", "topicCreate", "topicDelete", "kafkaConnectCreate", "kafkaConnectDelete"]))'
      errorMessage: "wildcard (name=\"*\") permissions cannot include write/create/delete on topics or connectors — scope by prefix instead"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: applicationgroup-prod-requires-external-group
spec:
  targetKind: ApplicationGroup
  description: production ApplicationGroups must use externalGroups/externalGroupRegex (SSO sync)
  rules:
    - condition: '!spec.permissions.exists(p, p.appInstance.matches(".*-(prod|prd)$")) || (size(spec.members) == 0 && ((has(spec.externalGroups) && size(spec.externalGroups) > 0) || (has(spec.externalGroupRegex) && size(spec.externalGroupRegex) > 0)))'
      errorMessage: "production ApplicationGroups must use externalGroups/externalGroupRegex (SSO sync) — manual spec.members lists rot when staff leave"
---
apiVersion: self-serve/v1
kind: ResourcePolicy
metadata:
  name: applicationgroup-no-subject-wildcard-read
spec:
  targetKind: ApplicationGroup
  description: SUBJECT permissions must be prefix-scoped (patternType=PREFIXED, name!="*")
  rules:
    - condition: 'spec.permissions.all(p, p.resourceType != "SUBJECT" || (p.name != "*" && p.patternType == "PREFIXED"))'
      errorMessage: "SUBJECT permissions must be prefix-scoped (patternType=PREFIXED, name!=\"*\") — schema field names leak PII structure"

Each policy must still be linked via Application(Instance).spec.policyRef or KafkaCluster.spec.policiesRef to take effect.

Policies in this bundle

Grouped by category. Click any policy for the rationale, examples, and YAML.

Naming

2

Partitions

2

Replication

3

Retention

3

Compression

1

Cleanup Policy

1

Schema Enforcement

3

Security & ACLs

4

Resource Limits

1

Operational Hygiene

2

Connectors

3

Enforce this bundle automatically?

Drop these YAMLs into Conduktor Console to get central enforcement, audit history, and pre-commit feedback for every change.

See Conduktor Console →