Why PostgreSQL “Bloats” Over Time—and How to Prevent It

A practical, beginner-friendly deep dive into MVCC, WAL, buffers, VACUUM/Autovacuum, and safe remediation strategies (with ready-to-run monitoring queries).

PostgreSQL is renowned for its reliability and performance, but one challenge that catches many database administrators off-guard is bloat. If you’ve noticed your PostgreSQL database growing larger over time without a proportional increase in actual data, you’re experiencing bloat—and you’re not alone. This comprehensive guide will help you understand why it happens and, more importantly, how to manage it effectively.

1. The Core Idea: MVCC Means No In-Place Updates

To understand bloat, you first need to grasp how PostgreSQL handles data modifications. Unlike some database systems that overwrite data in place, PostgreSQL uses MVCC (Multi-Version Concurrency Control), which creates a fundamentally different storage pattern:

INSERT operations write brand-new tuples (row versions) into the heap (table storage) and add corresponding index entries. This is straightforward and doesn’t immediately cause bloat.

UPDATE operations create entirely new tuple versions rather than modifying existing ones. The old version becomes “dead” (invisible to new transactions once you commit). However, there’s an optimization: if the updated columns aren’t indexed and there’s room on the same page, PostgreSQL may perform a HOT update (Heap-Only Tuple) that avoids touching indexes.

DELETE operations don’t actually remove tuples from storage. Instead, they simply mark tuples as no longer visible to future transactions. The physical space remains occupied.

This is where VACUUM becomes crucial. VACUUM is the process that eventually reclaims space from dead tuples, making pages available for reuse. Until VACUUM runs, dead tuples accumulate as bloat in both the heap and indexes.

2. A Chronological Story of How Bloat Happens

Understanding bloat is easier when you see it as a timeline of events:

T0 – INSERT Phase: You add rows to your database. The heap grows naturally, and indexes get new entries. Everything is clean and efficient.

T1 – UPDATE Phase: When you update rows, new row versions are appended to storage while old versions remain on disk (though marked as dead to future transactions after commit). Index entries may be added again if indexed columns changed, or skipped if HOT updates are possible.

T2 – DELETE Phase: Deleted tuples are marked as deleted, but their storage slots remain occupied on disk. No space is immediately reclaimed.

T3 – The Waiting Game: If there’s a long-running transaction with an old snapshot, the system cannot remove those dead tuples yet. VACUUM must wait until it’s safe to clean up, meaning bloat continues to accumulate.

T4 – AUTOVACUUM Intervention: When autovacuum finally runs, it marks dead tuples as reusable and cleans indexes to some degree. However, if autovacuum is too infrequent, overwhelmed, or blocked by old snapshots, dead tuples keep piling up faster than they’re cleaned.

T5 – Page Fragmentation: Even after cleanup, space often becomes available only as scattered free slots throughout the file. Table and index files don’t automatically shrink, and scans become progressively slower due to fragmentation.

T6 – Rewrite/Rebuild (Optional): Major operations like VACUUM FULL, CLUSTER, pg_repack, or REINDEX physically rewrite storage to compact it. This is the only way to actually shrink database files.

3. Why MVCC + WAL + Buffers Make This Possible (and Safe)

PostgreSQL’s architecture makes this bloat pattern both inevitable and manageable through several interconnected systems:

WAL (Write-Ahead Log) ensures crash-safety by logging every change, including new tuple versions, hint bits, and page modifications. Heavy UPDATE/DELETE workloads generate substantial WAL traffic and dirty many pages in memory.

Shared Buffers keep frequently accessed pages in memory. When you modify data, these pages become dirty and must eventually be flushed to disk by the background writer or during checkpoints. Until VACUUM reclaims space, dead tuples may linger in these buffered pages.

Visibility Rules require PostgreSQL to maintain old tuple versions until it’s certain no active transaction can see them. This is why long-running transactions or read replicas applying old WAL can significantly delay cleanup operations.

This design provides consistent reads without blocking writers—a major advantage for concurrent applications. However, the trade-off is bloat risk when cleanup operations lag behind modification activity.

4. Where Bloat Shows Up

Bloat manifests in several distinct areas of your database:

Heap Bloat (Tables) occurs when dead tuples and page fragmentation accumulate in table storage. This is often the most visible form of bloat.

Index Bloat happens when deleted or updated index entries linger in index structures. As indexes grow and become fragmented, query performance degrades.

TOAST Bloat affects very large columns stored out-of-line in TOAST tables. These accumulate dead chunks that require cleanup just like regular table data.

Free Space Fragmentation occurs even after VACUUM runs. On-page holes may remain scattered throughout the file, and the file size won’t automatically shrink.

5. Common Root Causes

Several patterns consistently lead to bloat problems:

Long-running transactions are the primary culprit, especially “idle in transaction” sessions that hold snapshots open indefinitely. These prevent VACUUM from cleaning up dead tuples that might otherwise be safely removed.

Inadequate autovacuum configuration often stems from default settings that are too conservative for high-activity databases. Scale factors may be too high, worker counts too low, or cost limits too restrictive.

High-churn tables like queues, sessions, or frequently updated user data generate dead tuples faster than most default configurations can clean them up.

Updates to indexed columns force index maintenance and prevent HOT updates, multiplying the cleanup work required and increasing bloat potential.

Aggressive page packing occurs when fillfactor is set to 100, leaving no room for HOT updates. This forces more index maintenance and reduces update efficiency.

Neglected index maintenance on heavily updated indexes leads to progressive degradation that periodic REINDEX or pg_repack operations could prevent.

6. Quick Monitoring: Find Bloat and Its Causes

The following queries work on PostgreSQL 11+ and provide immediate insight into your bloat situation:

6.1 “Dead Tuple” Hot Spots (Heap Side)

SELECT
  schemaname, relname,
  n_live_tup, n_dead_tup,
  round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup,0), 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 30;

This query identifies tables with the highest absolute numbers of dead tuples and shows what percentage of each table consists of dead data.

6.2 Size vs. Live Tuples (Rough Sense of Bloat)

SELECT
  relid::regclass AS table,
  pg_size_pretty(pg_relation_size(relid)) AS heap_size,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 30;

This helps you correlate table size with actual data volume, revealing cases where large tables contain relatively little live data.

6.3 Autovacuum Backlog & Progress

-- Who is vacuuming what
SELECT now() - v.phase AS elapsed, v.datname, v.relid::regclass AS table, v.phase
FROM pg_stat_progress_vacuum v
ORDER BY elapsed DESC;

-- How often autovacuum runs (enable autovacuum logging to see more)
SHOW log_autovacuum_min_duration;  -- set to a small value like '5s' to log activity

These queries show current vacuum operations and help you configure logging to better understand autovacuum behavior.

6.4 Long-Running Transactions That Block Cleanup

SELECT pid, usename, state, xact_start,
       now() - xact_start AS xact_age, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;  -- oldest first

Long-running transactions are vacuum’s worst enemy. This query helps you identify problematic sessions.

6.5 Indexes Largest First

SELECT
  c.relname AS index_name,
  i.indrelid::regclass AS table_name,
  pg_size_pretty(pg_relation_size(c.oid)) AS index_size
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname NOT IN ('pg_catalog','information_schema')
ORDER BY pg_relation_size(c.oid) DESC
LIMIT 30;

Large indexes often indicate bloat and represent opportunities for significant space reclamation.

6.6 Precise Bloat Estimates (Install Extension)

-- superuser or suitably privileged role
CREATE EXTENSION IF NOT EXISTS pgstattuple;

-- Heap bloat estimate
SELECT * FROM pgstattuple('public.your_table');

-- Index bloat estimate (btree)
SELECT * FROM pgstatindex('public.your_index');

The pgstattuple extension provides detailed bloat analysis when you need precise measurements.

7. Fixing Existing Bloat (From Least to Most Disruptive)

Always try low-impact options first, schedule heavier operations during maintenance windows, and measure before and after each intervention.

VACUUM (ANALYZE)

This is your first line of defense. VACUUM frees reusable space inside tables without shrinking files, and ANALYZE updates planner statistics for better query performance.

VACUUM (VERBOSE, ANALYZE) public.your_table;

This operation improves the visibility map and provides immediate feedback about the cleanup process.

Tighten Autovacuum per Hot Table (Reloptions)

Rather than changing global settings, tune autovacuum for specific problematic tables:

ALTER TABLE public.your_table
  SET (autovacuum_vacuum_scale_factor = 0.02,  -- 2%
       autovacuum_vacuum_threshold     = 50,
       autovacuum_analyze_scale_factor = 0.05,
       autovacuum_enabled              = true);

Follow this with VACUUM (ANALYZE) once to reset statistics and activate the new settings.

Reclaim and Compact Heap with a Rewrite

When you need to actually shrink files, you have several options:

VACUUM FULL compacts and shrinks files but requires an exclusive lock that blocks all reads and writes. Use this only during scheduled maintenance windows.

CLUSTER rewrites tables in index order, potentially improving locality for sequential scans, but also requires exclusive locking:

CLUSTER public.your_table USING some_index;

pg_repack provides online rewriting with minimal blocking, making it ideal for production environments:

pg_repack --table=public.your_table --dbname=yourdb

Fix Index Bloat

In PostgreSQL 11, concurrent index rebuilding requires a two-step process:

-- Rebuild online:
CREATE INDEX CONCURRENTLY idx_new ON public.your_table (col1, col2);
DROP INDEX CONCURRENTLY idx_old;

Alternatively, use pg_repack --index=public.idx_old for online index rebuilding.

Consider removing unused or redundant indexes to reduce future maintenance overhead.

TOAST Cleanup

Table rewrite operations (VACUUM FULL, CLUSTER, or pg_repack) also rewrite TOAST storage, compacting out-of-line large values and reclaiming substantial space in tables with large text or binary columns.

8. Preventing Future Bloat

Prevention is always preferable to remediation. Here’s how to design your database and operations to minimize bloat:

8.1 Tune Autovacuum

Scale Factors & Thresholds: Autovacuum triggers when dead tuples exceed vacuum_threshold + vacuum_scale_factor * reltuples. High-churn tables need smaller scale factors (1–5% instead of the default 20%).

Worker Capacity: Increase autovacuum_max_workers, reduce autovacuum_naptime (smaller values mean more frequent checks), and consider raising autovacuum_vacuum_cost_limit or lowering autovacuum_vacuum_cost_delay if vacuum operations consistently lag behind modification activity.

Per-Table Reloptions: This is the best practice for the hottest tables, allowing surgical tuning without affecting the entire database.

8.2 Encourage HOT Updates

Set fillfactor to reserve space on each page for updates:

ALTER TABLE public.your_table SET (fillfactor = 85);
VACUUM FULL public.your_table;  -- or pg_repack, to apply fillfactor

Design your schema to avoid updating indexed columns when possible. Consider moving frequently changing attributes to separate tables or columns that don’t require indexing.

8.3 Application Patterns

Kill Long Transactions: Set idle_in_transaction_session_timeout and audit application code paths that hold transactions open unnecessarily.

Batch Deletes with Immediate VACUUM: For queue or log tables:

DELETE FROM public.events WHERE created_at < now() - interval '30 days' LIMIT 10000;
VACUUM (ANALYZE) public.events;
-- repeat in a loop/job

Partitioning: Consider partitioning high-churn or time-series tables so you can DETACH/DROP or TRUNCATE old partitions (instant bloat removal).

Right-Size Indexes: Each additional index multiplies write work and bloat potential. Regularly audit your indexes for necessity and efficiency.

8.4 WAL & Checkpoint Hygiene (Keeps I/O Healthy)

Ensure checkpoints aren’t too frequent (causing excessive WAL volume) or too rare (extending recovery time). Monitor pg_stat_bgwriter and tune checkpoint settings appropriately.

Consider wal_compression if WAL volume is high, but test the CPU impact first.

Keep read replicas current, as very delayed replicas can hold back vacuum operations by maintaining old snapshots.

9. Ongoing Monitoring (Put These on a Dashboard)

9.1 Table/Index Growth and Dead Tuples

-- Top tables by dead tuples
SELECT relid::regclass, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

-- Top relations by total size
SELECT relkind,
       relname,
       pg_size_pretty(pg_total_relation_size(c.oid)) AS total
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname NOT IN ('pg_catalog','information_schema')
  AND relkind IN ('r','i')  -- tables and indexes
ORDER BY pg_total_relation_size(c.oid) DESC
LIMIT 20;

9.2 Autovacuum Effectiveness

-- Per table: last vacuum/analyze times
SELECT relid::regclass AS table, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY COALESCE(last_autovacuum, last_vacuum) NULLS FIRST
LIMIT 50;

9.3 Long Transactions (Blockers)

SELECT pid, usename, state, now() - xact_start AS age, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY age DESC
LIMIT 20;

9.4 I/O & Checkpoints (WAL/Buffer Health)

SELECT * FROM pg_stat_bgwriter;  -- checkpoints, buffers written, backend fsyncs, etc.

Pro Tip: Enable log_autovacuum_min_duration with a small value like 5s and monitor your logs to identify tables that constantly bloat or never receive adequate vacuuming.

10. A Pragmatic Playbook

When facing a bloat problem, follow this systematic approach:

  1. Identify: Run the monitoring queries to find tables with the most dead tuples and largest indexes.
  2. Unblock: Kill or fix long-running transactions and set idle_in_transaction_session_timeout to prevent future occurrences.
  3. Clean Light: Run VACUUM (ANALYZE) on the worst-affected tables and verify that statistics improve.
  4. Tune: Configure per-table autovacuum settings for hot tables and set appropriate fillfactor values.
  5. Reclaim Hard: Schedule pg_repack (preferred for online operation) or VACUUM FULL/CLUSTER during maintenance windows.
  6. Rebuild Indexes: Rebuild the largest and most bloated indexes online using the concurrent index creation and dropping process.
  7. Prevent: Revisit your schema design to avoid updating indexed columns, consider partitioning strategies, and implement routine purge and vacuum jobs.
  8. Observe: Create dashboards to monitor dead tuples, size growth, autovacuum recency, and long transactions on a weekly basis.

11. Handy Command Snippets

Run VACUUM Across a Database

vacuumdb --all --analyze-in-stages
# or for a single db
vacuumdb --dbname yourdb --analyze --verbose

pg_repack Examples

pg_repack --dbname=yourdb --table=public.hot_table
pg_repack --dbname=yourdb --index=public.big_idx

Per-Table Autovacuum Tuning

ALTER TABLE public.hot_table
  SET (autovacuum_vacuum_scale_factor = 0.02,
       autovacuum_vacuum_threshold     = 100,
       autovacuum_analyze_scale_factor = 0.05,
       fillfactor = 85);

12. Key Takeaways

Understanding PostgreSQL bloat is essential for maintaining a healthy, performant database. Remember these crucial points:

Bloat is a normal by-product of MVCC, not a bug—but you must actively manage cleanup operations.

Long-running transactions are your worst enemy—identify and fix these before attempting other remediation strategies.

Autovacuum requires tuning for high-activity tables—default settings are often too conservative for production workloads.

File shrinking requires explicit rewrite operations—plan for pg_repack, VACUUM FULL, or CLUSTER when space reclamation is necessary.

Design for HOT updates—use appropriate fillfactor settings and avoid updating indexed columns when possible.

Monitor WAL, buffer, and I/O health—checkpoint behavior significantly impacts overall database performance.

Make monitoring part of your routine—weekly reviews of bloat metrics prevent small problems from becoming major headaches.

By following these principles and implementing the monitoring and maintenance practices outlined in this guide, you’ll keep your PostgreSQL databases running efficiently regardless of workload intensity. Bloat doesn’t have to be a mystery or a crisis—with the right knowledge and tools, it’s simply another aspect of database administration that you can manage proactively and effectively.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>