Root cause
When the Debezium connector stops or lags, the PostgreSQL replication slot freezes its restart_lsn. PostgreSQL cannot remove WAL segments past that point. On high-write databases, 20–50 GB/h of WAL can accumulate silently until the disk is full.
How to fix
- Monitor pg_replication_slots:
SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag FROM pg_replication_slots; - Set max_slot_wal_keep_size = 50GB in postgresql.conf as a safety cap (PG 13+). PostgreSQL will invalidate the slot rather than fill the disk.
- Enable heartbeat.interval.ms in Debezium to advance the slot even on idle source tables.
- Alert when replication lag exceeds 1 GB. If the slot is already invalid, drop it and re-snapshot.
⚠ A frozen slot on a busy PostgreSQL server can fill the disk in hours, causing a full outage. Set max_slot_wal_keep_size and monitor slot lag.
Official Debezium documentation ↗