Blue-Green Deployment for PostgreSQL Upgrade

Enhanced Minimal Downtime Upgrade & Zero Downtime Migration Strategy

PostgreSQL 11 β†’ 15.9

🎯 Objective: Upgrade PostgreSQL from version 11 to 15.9 using a Blue-Green strategy with:

  • βœ… Zero downtime during data replication and testing
  • βœ… Minimal downtime during upgrade
  • βœ… Rollback strategy : promote original master if needed
  • βœ… Safety checks and monitoring
  • βœ… Connection management

πŸ”΅ Blue-Green Concept

  • Blue = Current production server (PostgreSQL 11)
  • Green = New target server (PostgreSQL 11 initially, PostgreSQL 15.9 after upgrade)
  • Use streaming replication to keep Green synced with Blue
  • Promote Green when ready
  • Upgrade Green to PG15.9 after switchover
  • If rollback is needed, re-promote Blue

βœ… Enhanced Step-by-Step Plan

πŸ”Ή Phase 1: Prepare Green (New Server)

  1. Provision new hardware (Green)
  2. Install PostgreSQL 11 (same exact version as Blue)
  3. Match all configuration:
    • postgresql.conf, pg_hba.conf, users, roles, extensions
  4. Create a replication user on Blue (e.g., replicator)
  5. Install PostgreSQL 15.9 binaries (for later upgrade)
  6. Set up connection management:
    • Install PgBouncer or prepare DNS CNAME switching
    • Document application connection string changes

πŸ”Ή Phase 2: Set Up Streaming Replication (PG11 β†’ PG11)

On Blue (current master):

# postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = 8GB  # Updated from wal_keep_segments for PG 11+
hot_standby = on
max_replication_slots = 5

# Create replication slot for better WAL retention
SELECT pg_create_physical_replication_slot('green_slot');

# pg_hba.conf
host replication replicator green_server_ip/32 md5

On Green (new server):

# Stop PostgreSQL
systemctl stop postgresql

# Clear data directory
rm -rf /var/lib/postgresql/11/main/*

# Perform base backup with replication slot
pg_basebackup -h blue_ip -D /var/lib/postgresql/11/main -U replicator -P -R -S green_slot

# Configure recovery settings in postgresql.conf
primary_slot_name = 'green_slot'

# Start PostgreSQL
systemctl start postgresql

βœ… Now Green is a hot standby synced with Blue.


πŸ”Ή Phase 3: Initial Validation & Monitoring Setup

Monitor replication status:

-- On Blue: Check replication status
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

-- On Green: Check receiver status  
SELECT status, receive_start_lsn, receive_start_tli, received_lsn, received_tli,
       last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time
FROM pg_stat_wal_receiver;

Basic validation:

  • Perform read-only query tests on Green
  • Validate user access and system compatibility
  • Set up monitoring alerts for replication lag > 100MB

πŸ”Ή Phase 3.5: Pre-Switchover Validation

Critical safety checks before switchover:

  1. Dry-run upgrade check:
# Test pg_upgrade compatibility (don't actually upgrade yet)
sudo -u postgres /usr/lib/postgresql/15/bin/pg_upgrade \
  --old-datadir=/var/lib/postgresql/11/main \
  --new-datadir=/var/lib/postgresql/15/main \
  --old-bindir=/usr/lib/postgresql/11/bin \
  --new-bindir=/usr/lib/postgresql/15/bin \
  --check
  1. Extension compatibility validation:
-- On Green: Check all extensions are PG15 compatible
SELECT name, default_version, installed_version 
FROM pg_available_extensions 
WHERE name IN (SELECT extname FROM pg_extension);
  1. Application connectivity test:
# Test read-only connection to Green
psql -h green_ip -U app_user -d app_db -c "SELECT version();"
  1. Replication lag verification:
-- Ensure lag is minimal (< 1MB) before switchover
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes 
FROM pg_stat_replication WHERE application_name = 'green_slot';
  1. Prepare connection switchover:
    • Stage application config changes
    • Prepare PgBouncer config or DNS changes
    • Document rollback connection procedures

πŸ”Ή Phase 4: Enhanced Manual Switchover to Green

Pre-switchover checklist:

  • [ ] Replication lag < 1MB
  • [ ] No replication errors in logs
  • [ ] Application teams notified
  • [ ] Rollback procedures documented
  • [ ] Connection switching method prepared
  • [ ] Connection termination procedures ready

Switchover procedure:

# 1. Set Blue to read-only mode (safer than stopping)
-- On Blue:
ALTER SYSTEM SET default_transaction_read_only = on;
SELECT pg_reload_conf();

# 2. Wait for Green to catch up completely
-- Monitor until lag = 0
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) FROM pg_stat_replication;

# 3. Promote Green server with verification
-- On Green:
pg_ctl promote -D /var/lib/postgresql/11/main

# 4. Verify promotion successful
-- On Green:
SELECT pg_is_in_recovery(); -- Should return false

# 5. Switch connections (choose one method):
# Option A: DNS CNAME switch
# Option B: PgBouncer config switch  
# Option C: Application config deployment

# 6. Verify application functionality
-- Test critical application functions
-- Monitor error rates and performance

βœ… Green is now the active master (PG11)

CRITICAL: Immediate Blue Sync Setup (Prevent Blue from becoming stale)

bash
# Option 1: Set up reverse replication immediately (Recommended)
# This ensures Blue stays current for potential rollback

# On Green (new master), create replication user for Blue
CREATE USER blue_replicator REPLICATION LOGIN PASSWORD 'secure_password';

# Configure Green postgresql.conf for reverse replication
echo "max_wal_senders = 6" >> /etc/postgresql/11/main/postgresql.conf  # Increase by 1
SELECT pg_reload_conf();

# Create replication slot for Blue
SELECT pg_create_physical_replication_slot('blue_sync_slot');

# Option 2: Alternative - Set up WAL archiving for Blue sync
# Configure if reverse replication is not preferred
echo "archive_mode = on" >> /etc/postgresql/11/main/postgresql.conf
echo "archive_command = 'rsync %p blue_server:/var/lib/postgresql/wal_archive/%f'" >> /etc/postgresql/11/main/postgresql.conf
SELECT pg_reload_conf();

πŸ”Ή Phase 5: Upgrade Green to PostgreSQL 15.9

CRITICAL: Connection Management Before Upgrade

# 1. Check for active connections (MANDATORY)
-- On Green: Verify no active connections before upgrade
SELECT pid, usename, application_name, client_addr, state, query_start, query
FROM pg_stat_activity 
WHERE state = 'active' AND pid <> pg_backend_pid();

# 2. If connections detected, terminate them safely
-- Option A: Graceful termination (preferred)
SELECT pg_cancel_backend(pid) 
FROM pg_stat_activity 
WHERE state = 'active' AND pid <> pg_backend_pid();

-- Wait 30 seconds, then check again
-- Option B: Force termination if graceful fails
SELECT pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE state = 'active' AND pid <> pg_backend_pid();

# 3. Prevent new connections during upgrade
-- Update pg_hba.conf to reject new connections (backup original first)
cp /etc/postgresql/11/main/pg_hba.conf /etc/postgresql/11/main/pg_hba.conf.backup
echo "# UPGRADE MODE - NO CONNECTIONS ALLOWED" > /etc/postgresql/11/main/pg_hba.conf
echo "local all postgres peer" >> /etc/postgresql/11/main/pg_hba.conf
SELECT pg_reload_conf();

# 4. Final connection verification
SELECT count(*) FROM pg_stat_activity WHERE pid <> pg_backend_pid();
-- Should return 0 before proceeding

# 5. Stop PostgreSQL (this is the main downtime period)
systemctl stop postgresql

# 6. Run pg_upgrade (already validated in Phase 3.5)
sudo -u postgres /usr/lib/postgresql/15/bin/pg_upgrade \
  --old-datadir=/var/lib/postgresql/11/main \
  --new-datadir=/var/lib/postgresql/15/main \
  --old-bindir=/usr/lib/postgresql/11/bin \
  --new-bindir=/usr/lib/postgresql/15/bin \
  --link  # Use hard links for faster upgrade

# 7. Update systemd service to use PG15
systemctl stop postgresql@11-main
systemctl start postgresql@15-main
systemctl enable postgresql@15-main

# 8. Restore connection access
-- Restore original pg_hba.conf for PG15
cp /etc/postgresql/11/main/pg_hba.conf.backup /etc/postgresql/15/main/pg_hba.conf
SELECT pg_reload_conf();

# 9. Run post-upgrade scripts
sudo -u postgres ./analyze_new_cluster.sh

# 10. Test application connectivity and functionality
# 11. After validation: sudo -u postgres ./delete_old_cluster.sh

⏱ Downtime: ~5–10 minutes max


πŸ”Ή Phase 6: Enhanced Rollback Plan with Blue Data Sync

Option A: Immediate Rollback (if issues detected within 30 minutes)

bash
# 1. Stop Green server
systemctl stop postgresql@15-main

# 2. Re-enable Blue server  
-- On Blue:
ALTER SYSTEM RESET default_transaction_read_only;
SELECT pg_reload_conf();

# 3. Promote Blue (if needed)
pg_ctl promote -D /var/lib/postgresql/11/main

# 4. Switch connections back to Blue
# Use reverse of connection switching method from Phase 4

Option B: Blue Data Recovery (Prevent Blue from becoming stale)

B1: Continuous Blue Sync Strategy (Recommended)

bash
# After Green promotion, immediately set up reverse replication
# This keeps Blue current with Green's data

# 1. On Green (new master PG15), create replication user
CREATE USER blue_replicator REPLICATION LOGIN PASSWORD 'secure_password';

# 2. Configure Green for replication
-- postgresql.conf on Green
wal_level = replica
max_wal_senders = 5
wal_keep_size = 8GB
max_replication_slots = 5

# Create replication slot for Blue
SELECT pg_create_physical_replication_slot('blue_sync_slot');

# 3. Configure Blue as standby of Green
-- Stop Blue temporarily
systemctl stop postgresql@11-main

# Clear Blue data directory and sync from Green
rm -rf /var/lib/postgresql/11/main/*
pg_basebackup -h green_ip -D /var/lib/postgresql/11/main -U blue_replicator -P -R -S blue_sync_slot

# Configure Blue's recovery settings
echo "primary_slot_name = 'blue_sync_slot'" >> /var/lib/postgresql/11/main/postgresql.conf

# Start Blue as standby
systemctl start postgresql@11-main

B2: WAL Shipping Strategy (Alternative)

bash
# Set up WAL archiving on Green to keep Blue synchronized

# 1. On Green, configure WAL archiving
-- postgresql.conf
archive_mode = on
archive_command = 'rsync %p blue_server:/var/lib/postgresql/wal_archive/%f'

# 2. On Blue, configure WAL replay
-- recovery.conf or postgresql.conf (PG12+)
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'
recovery_target_timeline = 'latest'

Option C: Delayed Rollback with Data Recovery

C1: Point-in-Time Recovery Rollback

bash
# If Blue became stale, use PITR to bring it current

# 1. Stop Green server
systemctl stop postgresql@15-main

# 2. Get the exact LSN where Green was promoted
-- Check Green's promotion LSN from logs
grep "selected new timeline" /var/log/postgresql/postgresql-15-main.log

# 3. Perform PITR on Blue to catch up
-- On Blue: Configure recovery to specific point
echo "recovery_target_lsn = 'LSN_FROM_GREEN_PROMOTION'" > /var/lib/postgresql/11/main/recovery.conf
echo "recovery_target_timeline = 'latest'" >> /var/lib/postgresql/11/main/recovery.conf

# 4. Start Blue with recovery
systemctl start postgresql@11-main

# 5. Promote Blue when recovery completes
pg_ctl promote -D /var/lib/postgresql/11/main

C2: Logical Replication Catch-up (For Large Data Gaps)

bash
# Use logical replication to sync specific tables/changes

# 1. On Green, create publication for critical tables
CREATE PUBLICATION blue_catchup FOR ALL TABLES;

# 2. On Blue, create subscription
CREATE SUBSCRIPTION blue_sync 
CONNECTION 'host=green_ip dbname=mydb user=replicator' 
PUBLICATION blue_catchup;

# 3. Monitor synchronization
SELECT * FROM pg_stat_subscription;

# 4. When caught up, promote Blue
pg_ctl promote -D /var/lib/postgresql/11/main

Rollback Decision Matrix:

Time Since SwitchoverData VolumeRecommended Strategy
< 30 minutesAnyOption A: Direct rollback
30 min – 2 hoursLow-MediumOption B1: Continuous sync
30 min – 2 hoursHighOption B2: WAL shipping
> 2 hoursLowOption C2: Logical replication
> 2 hoursHighOption C1: PITR recovery

Enhanced Blue Sync Monitoring:

sql
-- Monitor Blue sync status (when using continuous sync)
-- On Green:
SELECT client_addr, application_name, state, 
       pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS lag_bytes
FROM pg_stat_replication WHERE application_name = 'blue_sync';

-- On Blue:
SELECT status, received_lsn, latest_end_lsn,
       pg_wal_lsn_diff(latest_end_lsn, received_lsn) AS lag_bytes
FROM pg_stat_wal_receiver;

πŸ”Ή Phase 7: Enhanced Monitoring & Validation

Post-upgrade monitoring (first 24 hours):

  • Application error rates
  • Query performance metrics
  • Replication lag (if standby is re-established)
  • Resource utilization
  • Extension functionality

Performance validation:

-- Compare query performance before/after
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- Check for any deprecated features
SELECT * FROM pg_stat_user_tables WHERE schemaname NOT IN ('information_schema', 'pg_catalog');

πŸ”Ή Phase 8: Finalization & Reuse

After confirming success:

  • Clean up PostgreSQL 11 binaries on Green
  • Optionally reconfigure Blue as:
    • Hot standby (replica of PG15) – recommended
    • Read-only server for reporting
    • Backup/DR server
  • Drop replication slot if not reusing: SELECT pg_drop_replication_slot('green_slot');

⏱️ Enhanced Downtime Summary

StepDowntimeRisk Level
Streaming Replication Setup0Low
Pre-switchover Validation0Low
Promote Green~30 secondsMedium
Connection Switchover~1-2 minutesMedium
Connection Termination & Validation~2-3 minutesMedium
pg_upgrade Execution~5–10 minutesHigh
Total Planned Downtime~8-16 minutes
Rollback (if needed)~1–3 minutesHigh

πŸ§ͺ Enhanced Key Commands Recap

Setup & Monitoring:

  • pg_basebackup -S slot_name: Clone data with replication slot
  • pg_create_physical_replication_slot(): Better WAL retention
  • pg_stat_replication: Monitor sync status
  • pg_wal_lsn_diff(): Calculate replication lag
  • Blue sync commands: Reverse replication setup
  • WAL archiving: archive_mode and restore_command
  • PITR recovery: recovery_target_lsn and timeline management

Validation & Safety:

  • pg_upgrade --check: Dry-run validation
  • pg_is_in_recovery(): Verify promotion status
  • ALTER SYSTEM SET default_transaction_read_only: Safe read-only mode
  • pg_stat_activity: Monitor active connections
  • pg_cancel_backend() / pg_terminate_backend(): Connection management

Upgrade & Control:

  • pg_ctl promote: Manual promotion
  • pg_upgrade --link: Fast major version upgrade
  • analyze_new_cluster.sh: Post-upgrade optimization

βœ… Enhanced Benefits

  • βœ”οΈ Comprehensive safety checks before each critical step
  • βœ”οΈ Detailed monitoring and validation procedures
  • βœ”οΈ Improved connection management options
  • βœ”οΈ Enhanced rollback strategy with multiple scenarios
  • βœ”οΈ Blue data sync prevention – keeps Blue current for rollback
  • βœ”οΈ Multiple rollback options based on timing and data volume
  • βœ”οΈ Point-in-time recovery capabilities for delayed rollbacks
  • βœ”οΈ Logical replication for selective data synchronization
  • βœ”οΈ Replication slots for better WAL management
  • βœ”οΈ Read-only mode instead of stopping Blue (safer)
  • βœ”οΈ Pre-upgrade validation to catch issues early
  • βœ”οΈ Detailed runbooks to minimize human error
  • βœ”οΈ No external tools dependency (no HAProxy)
  • βœ”οΈ Full control over cutover and recovery process

🚨 Critical Success Factors

  1. Test the entire procedure in staging environment first
  2. Monitor replication lag continuously during setup
  3. Validate application compatibility with read-only Green before switchover
  4. Have clear communication plan with application teams
  5. Practice rollback procedures before go-live
  6. Set up proper monitoring for post-upgrade validation

πŸ“š References & Documentation

Official PostgreSQL Documentation

  • PostgreSQL 15 Release Notes: https://www.postgresql.org/docs/15/release-15.html
  • pg_upgrade Documentation: https://www.postgresql.org/docs/15/pgupgrade.html
  • Streaming Replication: https://www.postgresql.org/docs/15/warm-standby.html#STREAMING-REPLICATION
  • pg_basebackup: https://www.postgresql.org/docs/15/app-pgbasebackup.html
  • Replication Slots: https://www.postgresql.org/docs/15/warm-standby.html#STREAMING-REPLICATION-SLOTS
  • Monitoring Statistics: https://www.postgresql.org/docs/15/monitoring-stats.html

PostgreSQL Upgrade Guides

  • Major Version Upgrade Methods: https://www.postgresql.org/docs/15/upgrading.html
  • Upgrade Best Practices: https://wiki.postgresql.org/wiki/Upgrading
  • pg_upgrade Tutorial: https://www.postgresql.org/docs/15/pgupgrade.html
  • Version Compatibility Matrix: https://www.postgresql.org/support/versioning/

High Availability & Replication

  • High Availability Guide: https://www.postgresql.org/docs/15/high-availability.html
  • Replication Configuration: https://www.postgresql.org/docs/15/runtime-config-replication.html
  • WAL Configuration: https://www.postgresql.org/docs/15/runtime-config-wal.html
  • Backup and Recovery: https://www.postgresql.org/docs/15/backup.html

PostgreSQL Configuration

  • Configuration Parameters: https://www.postgresql.org/docs/15/runtime-config.html
  • pg_hba.conf Documentation: https://www.postgresql.org/docs/15/auth-pg-hba-conf.html
  • Client Authentication: https://www.postgresql.org/docs/15/client-authentication.html

Monitoring & Troubleshooting

  • System Views: https://www.postgresql.org/docs/15/monitoring-stats.html#MONITORING-STATS-VIEWS
  • pg_stat_replication: https://www.postgresql.org/docs/15/monitoring-stats.html#MONITORING-STATS-REPLICATION
  • Activity Monitoring: https://www.postgresql.org/docs/15/monitoring-stats.html#MONITORING-STATS-ACTIVITY
  • WAL Monitoring: https://www.postgresql.org/docs/15/functions-info.html#FUNCTIONS-INFO-MISC

Tools & Utilities

  • PostgreSQL Utilities: https://www.postgresql.org/docs/15/reference-server.html
  • pg_ctl Documentation: https://www.postgresql.org/docs/15/app-pg-ctl.html
  • PostgreSQL System Catalogs: https://www.postgresql.org/docs/15/catalogs.html

Best Practices & Community Guides

  • PostgreSQL Wiki: https://wiki.postgresql.org/wiki/Main_Page
  • Upgrade Checklist: https://wiki.postgresql.org/wiki/Upgrade_Checklist
  • Blue-Green Deployment Patterns: https://martinfowler.com/bliki/BlueGreenDeployment.html
  • Database Reliability Engineering: https://www.oreilly.com/library/view/database-reliability-engineering/9781491925935/

Community Resources

  • PostgreSQL Mailing Lists: https://www.postgresql.org/list/
  • PostgreSQL Planet (Blogs): https://planet.postgresql.org/
  • PostgreSQL Slack Community: https://postgres-slack.herokuapp.com/
  • Stack Overflow PostgreSQL: https://stackoverflow.com/questions/tagged/postgresql

Vendor-Specific Guides

  • AWS RDS PostgreSQL Upgrades: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.PostgreSQL.html
  • Google Cloud SQL Upgrades: https://cloud.google.com/sql/docs/postgres/upgrade-major-db-version
  • Azure Database Upgrades: https://docs.microsoft.com/en-us/azure/postgresql/how-to-upgrade-using-dump-and-restore

Benchmarking & Performance

  • pgbench Documentation: https://www.postgresql.org/docs/15/pgbench.html
  • PostgreSQL Performance Tuning: https://wiki.postgresql.org/wiki/Performance_Optimization
  • Query Performance: https://www.postgresql.org/docs/15/using-explain.html

Security & Compliance

  • PostgreSQL Security: https://www.postgresql.org/docs/15/security.html
  • SSL Configuration: https://www.postgresql.org/docs/15/ssl-tcp.html
  • User Management: https://www.postgresql.org/docs/15/user-manag.html

πŸ“– Additional Reading

Books

  • “PostgreSQL: Up and Running” by Regina Obe & Leo Hsu
  • “PostgreSQL High Availability Cookbook” by Shaun Thomas
  • “Mastering PostgreSQL” by Hans-JΓΌrgen SchΓΆnig

Blogs & Articles

  • 2ndQuadrant Blog: https://www.2ndquadrant.com/en/blog/
  • Percona PostgreSQL Blog: https://www.percona.com/blog/category/postgresql/
  • EDB Blog: https://www.enterprisedb.com/blog
  • Citus Data Blog: https://www.citusdata.com/blog/

Training & Certification

  • PostgreSQL Training: https://www.postgresql.org/support/professional_training/
  • EDB Training: https://www.enterprisedb.com/training
  • Postgres Professional: https://postgrespro.com/education

πŸ› οΈ Tools & Scripts Repository

Useful Scripts

# Replication monitoring script
#!/bin/bash
# replication_monitor.sh
while true; do
  psql -h primary_ip -c "SELECT client_addr, state, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag FROM pg_stat_replication;"
  sleep 10
done

# Connection checker script
#!/bin/bash
# check_connections.sh
psql -t -c "SELECT count(*) FROM pg_stat_activity WHERE pid <> pg_backend_pid();"

Monitoring Queries Collection

-- Save as monitoring_queries.sql
-- Active connections
SELECT pid, usename, application_name, client_addr, state, query_start 
FROM pg_stat_activity WHERE state = 'active';

-- Replication lag
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes 
FROM pg_stat_replication;

-- Database sizes
SELECT datname, pg_size_pretty(pg_database_size(datname)) AS size 
FROM pg_database ORDER BY pg_database_size(datname) DESC;

πŸ“ž Support & Community

Getting Help

  • PostgreSQL IRC: #postgresql on Freenode
  • PostgreSQL Forums: https://www.postgresql.org/list/
  • Reddit: https://www.reddit.com/r/PostgreSQL/
  • Discord: PostgreSQL Community Server

Professional Support

  • PostgreSQL Commercial Support: https://www.postgresql.org/support/professional_support/
  • EDB Support: https://www.enterprisedb.com/support
  • 2ndQuadrant Support: https://www.2ndquadrant.com/support/


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>