1. Introduction
In high-concurrency PostgreSQL environments—such as financial systems, e-commerce platforms, or large-scale SaaS products—deadlocks represent one of the most critical blocking issues that can halt mission-critical workloads and severely impact system reliability.
A deadlock occurs when two or more transactions create a circular dependency by waiting indefinitely for resources (rows, tables, or advisory locks) that are locked by each other. While PostgreSQL includes sophisticated built-in deadlock detection mechanisms, relying solely on detection means that downtime, failed transactions, and performance degradation have already occurred by the time the system responds.
The Business Impact
Deadlocks in production systems can cause:
- Revenue Loss: Failed payment transactions, abandoned shopping carts
- Data Integrity Issues: Partial updates leaving systems in inconsistent states
- Cascading Failures: One deadlock triggering a cascade of timeouts and retries
- Poor User Experience: Slow response times and transaction failures
- Operational Overhead: Manual intervention and emergency hotfixes
For enterprise systems processing thousands of transactions per second, the goal should be prevention first, detection second. A proactive approach combining architectural patterns, monitoring, and tuning can reduce deadlock incidents by 90% or more.
PostgreSQL’s Lock Architecture
PostgreSQL uses a multi-layered locking system:
- MVCC (Multi-Version Concurrency Control): Allows concurrent reads without blocking
- Row-level locks: Protect individual rows during updates
- Table-level locks: Control schema changes and bulk operations
- Advisory locks: Application-defined custom locks
- Page-level locks: Internal buffer management (usually transparent)
Understanding these layers is crucial for effective deadlock prevention.
2. How Deadlocks Form in PostgreSQL
PostgreSQL’s MVCC system allows concurrent reads and writes without blocking in most scenarios. However, when transactions require exclusive locks—whether on rows, tables, or custom advisory resources—contention can escalate into deadlock cycles.
2.1 The Anatomy of a Deadlock
Deadlocks require four conditions (Coffman conditions):
- Mutual Exclusion: Resources cannot be shared
- Hold and Wait: Processes hold resources while waiting for others
- No Preemption: Resources cannot be forcibly taken away
- Circular Wait: A circular chain of processes waiting for each other
2.2 Common Deadlock Scenarios
Classic Two-Transaction Deadlock
sql
-- Transaction A (Session 1)BEGIN;UPDATEaccountsSETbalance=balance-100WHEREid=1;-- Acquires exclusive lock on row 1-- ... some business logic ...UPDATEaccountsSETbalance=balance+100WHEREid=2;-- Waits for exclusive lock on row 2-- Transaction B (Session 2) - running concurrentlyBEGIN;UPDATEaccountsSETbalance=balance-50WHEREid=2;-- Acquires exclusive lock on row 2-- ... some business logic ...UPDATEaccountsSETbalance=balance+50WHEREid=1;-- Waits for exclusive lock on row 1
Result: Transaction A holds lock on row 1 and waits for row 2. Transaction B holds lock on row 2 and waits for row 1. Circular dependency detected.
Foreign Key Deadlock
sql
-- Transaction ABEGIN;INSERTINTOorders (customer_id, amount)VALUES(123,500.00);-- Acquires shared lock on customers(123)UPDATEcustomersSETlast_order_date=NOW()WHEREid=123;-- Waits for exclusive lock-- Transaction BBEGIN;UPDATEcustomersSETcredit_limit=5000WHEREid=123;-- Acquires exclusive lock on customers(123)INSERTINTOorders (customer_id, amount)VALUES(123,200.00);-- Waits for shared lock
Index Deadlock (B-tree Splits)
sql
-- High-frequency inserts on indexed columns can cause deadlocks during B-tree page splits-- Transaction A inserts into page X, triggers split, needs lock on page Y-- Transaction B inserts into page Y, triggers split, needs lock on page X
SELECT FOR UPDATE Deadlock
sql
-- Transaction ABEGIN;SELECT*FROMinventoryWHEREproduct_id='PROD1'FORUPDATE;-- Locks PROD1SELECT*FROMinventoryWHEREproduct_id='PROD2'FORUPDATE;-- Waits for PROD2-- Transaction BBEGIN;SELECT*FROMinventoryWHEREproduct_id='PROD2'FORUPDATE;-- Locks PROD2SELECT*FROMinventoryWHEREproduct_id='PROD1'FORUPDATE;-- Waits for PROD1
2.3 Lock Escalation and Deadlock Risk
PostgreSQL can escalate locks under memory pressure:
- Row locks → Table locks (when max_locks_per_transaction exceeded)
- Shared locks → Exclusive locks (during certain operations)
Lock escalation increases deadlock probability by reducing granularity and creating more contention points.
3. Detection Mechanisms
PostgreSQL’s deadlock detector uses a sophisticated algorithm to identify cycles in the wait-for graph. Understanding its behavior is essential for tuning detection performance.
3.1 Deadlock Detection Algorithm
The detector runs periodically and:
- Builds Wait-For Graph: Maps which transactions are waiting for which resources
- Cycle Detection: Uses depth-first search to find circular dependencies
- Victim Selection: Chooses which transaction to abort (typically the one with least work done)
- Cleanup: Releases locks and notifies the application with a deadlock error
3.2 Key Configuration Parameters
deadlock_timeout (default: 1s)
sql
-- View current settingSHOWdeadlock_timeout;-- Tune for faster detection in high-frequency systemsALTERSYSTEMSETdeadlock_timeout='200ms';SELECTpg_reload_conf();
Considerations:
- Lower values (100-500ms): Faster detection but more CPU overhead
- Higher values (2-5s): Less overhead but longer wait times
- Very low values (<100ms): May cause false positives on heavily loaded systems
log_lock_waits (default: off)
sql
-- Enable lock wait loggingALTERSYSTEMSETlog_lock_waits='on';SELECTpg_reload_conf();
Log Output Example:
LOG: process 12345 still waiting for ShareLock on transaction 67890 after 1000.123 msDETAIL: Process holding the lock: 12346. Wait queue: 12345.STATEMENT: UPDATE accounts SET balance = balance + 100 WHERE id = 2;
Additional Monitoring Parameters
sql
-- Track lock contention statisticsALTERSYSTEMSETtrack_locks='on';-- Log slow queries that might hold locks too longALTERSYSTEMSETlog_min_duration_statement='1000ms';-- Log all DDL operations that acquire strong locksALTERSYSTEMSETlog_statement='ddl';
3.3 Real-Time Lock Monitoring Queries
Current Lock Status
sql
SELECTpl.pid,pl.locktype,pl.mode,pl.granted,pl.relation::regclassAStable_name,pl.page,pl.tuple,pl.virtualtransaction,psa.query,psa.state,psa.wait_event_type,psa.wait_event,NOW()-psa.query_startASdurationFROMpg_locks plJOINpg_stat_activity psaONpl.pid=psa.pidORDERBYpl.grantedASC, psa.query_startASC;
Blocking Relationships
sql
SELECTblocked_locks.pidASblocked_pid,blocked_activity.usenameASblocked_user,blocking_locks.pidASblocking_pid,blocking_activity.usenameASblocking_user,blocked_activity.queryASblocked_statement,blocking_activity.queryASblocking_statement,blocked_activity.application_nameASblocked_application,blocking_activity.application_nameASblocking_application,blocked_locks.locktypeASblocked_locktype,blocked_locks.modeASblocked_mode,blocking_locks.modeASblocking_mode,NOW()-blocked_activity.query_startASblocked_durationFROMpg_catalog.pg_locks blocked_locksJOINpg_catalog.pg_stat_activity blocked_activityONblocked_activity.pid=blocked_locks.pidJOINpg_catalog.pg_locks blocking_locksONblocking_locks.locktype=blocked_locks.locktypeANDblocking_locks.databaseISNOTDISTINCTFROMblocked_locks.databaseANDblocking_locks.relationISNOTDISTINCTFROMblocked_locks.relationANDblocking_locks.pageISNOTDISTINCTFROMblocked_locks.pageANDblocking_locks.tupleISNOTDISTINCTFROMblocked_locks.tupleANDblocking_locks.virtualxidISNOTDISTINCTFROMblocked_locks.virtualxidANDblocking_locks.transactionidISNOTDISTINCTFROMblocked_locks.transactionidANDblocking_locks.classidISNOTDISTINCTFROMblocked_locks.classidANDblocking_locks.objidISNOTDISTINCTFROMblocked_locks.objidANDblocking_locks.objsubidISNOTDISTINCTFROMblocked_locks.objsubidANDblocking_locks.pid!=blocked_locks.pidJOINpg_catalog.pg_stat_activity blocking_activityONblocking_activity.pid=blocking_locks.pidWHERENOTblocked_locks.granted;
Lock Wait Statistics
sql
-- Create a monitoring view for regular checkingCREATEORREPLACEVIEWlock_monitorASSELECTlocktype,mode,COUNT(*)aslock_count,COUNT(*) FILTER (WHERENOTgranted)ASwaiting_count,AVG(EXTRACT(EPOCHFROM(NOW()-query_start)))ASavg_wait_timeFROMpg_locks plJOINpg_stat_activity psaONpl.pid=psa.pidGROUPBYlocktype,modeORDERBYwaiting_countDESC, avg_wait_timeDESC;-- UsageSELECT*FROMlock_monitorWHEREwaiting_count>0;
4. Prevention Strategies
Prevention is always more effective than detection. Here are comprehensive strategies organized by complexity and impact.
4.1 Enforce Consistent Lock Ordering
The most effective deadlock prevention technique is establishing and enforcing a system-wide lock ordering convention.
Implementation Approaches
Primary Key Ordering:
sql
-- Bad: Inconsistent orderingBEGIN;UPDATEaccountsSETbalance=balance-100WHEREid=456;UPDATEaccountsSETbalance=balance+100WHEREid=123;COMMIT;-- Good: Always lock lower ID firstBEGIN;UPDATEaccountsSETbalance=balance+100WHEREid=123;-- Lower ID firstUPDATEaccountsSETbalance=balance-100WHEREid=456;-- Higher ID secondCOMMIT;
Application-Level Implementation:
python
deftransfer_funds(from_id, to_id, amount):# Always process accounts in ascending ID orderfirst_id, second_id=(from_id, to_id)iffrom_id<to_idelse(to_id, from_id)withtransaction():# Lock accounts in consistent orderiffirst_id==from_id:debit_account(from_id, amount)credit_account(to_id, amount)else:credit_account(to_id, amount)debit_account(from_id, amount)
Table-Level Ordering:
sql
-- Define explicit table hierarchy-- Level 1: Reference tables (customers, products)-- Level 2: Transactional tables (orders, payments)-- Level 3: Derived tables (analytics, logs)-- Always lock in hierarchical order:BEGIN;LOCKTABLEcustomersINROWEXCLUSIVEMODE;LOCKTABLEordersINROWEXCLUSIVEMODE;LOCKTABLEorder_itemsINROWEXCLUSIVEMODE;-- ... perform operations ...COMMIT;
Complex Multi-Table Operations:
sql
-- Create a stored procedure that enforces orderingCREATEORREPLACEFUNCTIONsafe_order_update(p_customer_idINT,p_product_idsINT[],p_quantitiesINT[])RETURNSVOIDAS$$DECLAREsorted_productsINT[];BEGIN-- Sort product IDs to ensure consistent locking orderSELECTARRAY_AGG(unnestORDERBYunnest)INTOsorted_productsFROMunnest(p_product_ids);-- Lock customer first (lowest hierarchy)PERFORM*FROMcustomersWHEREid=p_customer_idFORUPDATE;-- Lock products in sorted orderPERFORM*FROMproductsWHEREid=ANY(sorted_products)ORDERBYidFORUPDATE;-- Perform business logic...-- UPDATE operations hereEND;$$LANGUAGEplpgsql;
4.2 Minimize Transaction Scope and Duration
Long-running transactions dramatically increase deadlock probability by holding locks longer and creating more opportunities for conflicts.
Transaction Boundary Optimization
Anti-Pattern: Long Transaction:
sql
BEGIN;-- Step 1: Update inventory (holds locks)UPDATEinventorySETquantity=quantity-1WHEREproduct_id='PROD123';-- Step 2: Call external payment API (network I/O - very slow!)SELECTprocess_payment_external('card123',99.99);-- 2-5 seconds-- Step 3: Update order statusUPDATEordersSETstatus='paid'WHEREid=12345;-- Step 4: Send confirmation email (more network I/O)SELECTsend_email_external('user@email.com','Order confirmed');-- 1-3 secondsCOMMIT;-- Total transaction time: 3-8 seconds with locks held
Best Practice: Minimal Transaction:
sql
-- Step 1: Pre-validate outside transactionSELECTquantityFROMinventoryWHEREproduct_id='PROD123';-- Check if sufficient inventory available-- Step 2: Minimal transaction for data changes onlyBEGIN;UPDATEinventorySETquantity=quantity-1WHEREproduct_id='PROD123';INSERTINTOorders (customer_id, product_id,status)VALUES(123,'PROD123','pending');SELECTorder_idFROMordersWHEREcustomer_id=123ORDERBYcreated_atDESCLIMIT1;COMMIT;-- Transaction time: 10-50ms-- Step 3: External operations outside transactionSELECTprocess_payment_external('card123',99.99);-- Step 4: Final update transactionBEGIN;UPDATEordersSETstatus='paid'WHEREid=@order_id;COMMIT;-- Step 5: Async email sendingSELECTqueue_email_async('user@email.com','Order confirmed');
Connection Pooling Considerations
python
# Bad: Long-lived connection holds transactiondefprocess_order():conn=get_connection()conn.execute("BEGIN")# Multiple operations with user inputproduct_id=input("Enter product ID: ")# User interaction!conn.execute("UPDATE inventory SET quantity = quantity - 1 WHERE product_id = %s", [product_id])# More user interactionconfirm=input("Confirm order? (y/n): ")ifconfirm=='y':conn.execute("INSERT INTO orders ...")conn.execute("COMMIT")else:conn.execute("ROLLBACK")# Good: Acquire connection only when neededdefprocess_order():product_id=input("Enter product ID: ")# Outside transactionconfirm=input("Confirm order? (y/n): ")# Outside transactionifconfirm=='y':withget_connection()asconn:# Brief connection usageconn.execute("BEGIN")conn.execute("UPDATE inventory SET quantity = quantity - 1 WHERE product_id = %s", [product_id])conn.execute("INSERT INTO orders ...")conn.execute("COMMIT")
4.3 Reduce Lock Granularity and Scope
Using the most specific locks possible reduces contention and deadlock probability.
Row-Level Locking Strategies
SELECT FOR UPDATE Variants:
sql
-- Standard FOR UPDATE: Blocks other transactionsSELECT*FROMinventoryWHEREproduct_id='PROD123'FORUPDATE;-- FOR UPDATE SKIP LOCKED: Skip locked rows (useful for job queues)SELECT*FROMjob_queueWHEREstatus='pending'ORDERBYpriorityDESC, created_atASCFORUPDATESKIP LOCKEDLIMIT1;-- FOR UPDATE NOWAIT: Fail immediately if row is lockedBEGIN;SELECT*FROMaccountsWHEREid=123FORUPDATENOWAIT;-- If row is locked, get immediate error instead of waiting
Conditional Locking:
sql
-- Lock only when necessaryDO$$DECLAREcurrent_balanceDECIMAL;account_idINT:=123;transfer_amountDECIMAL:=500.00;BEGIN-- Check balance first (no lock needed for read)SELECTbalanceINTOcurrent_balanceFROMaccountsWHEREid=account_id;-- Only acquire lock if transfer is viableIFcurrent_balance>=transfer_amountTHEN-- Lock and double-check (balance might have changed)SELECTbalanceINTOcurrent_balanceFROMaccountsWHEREid=account_idFORUPDATE;IFcurrent_balance>=transfer_amountTHENUPDATEaccountsSETbalance=balance-transfer_amountWHEREid=account_id;ELSERAISE EXCEPTION'Insufficient funds after lock acquisition';ENDIF;ELSERAISE EXCEPTION'Insufficient funds';ENDIF;END;$$;
Index Design for Reduced Contention
Partition Indexes by Access Pattern:
sql
-- Bad: Single index on timestamp creates hotspotCREATEINDEXidx_orders_created_atONorders(created_at);-- Good: Composite index spreads loadCREATEINDEXidx_orders_customer_createdONorders(customer_id, created_at);-- Better: Partial indexes reduce index contentionCREATEINDEXidx_orders_pendingONorders(created_at)WHEREstatus='pending';CREATEINDEXidx_orders_activeONorders(created_at)WHEREstatusIN('processing','shipped');
UUID vs Sequential IDs:
sql
-- Sequential IDs create contention on index tailCREATETABLEorders_sequential (idSERIALPRIMARYKEY,-- All inserts contend for same index pagecreated_atTIMESTAMPDEFAULTNOW());-- UUIDs distribute inserts across indexCREATEEXTENSIONIFNOTEXISTS"uuid-ossp";CREATETABLEorders_distributed (id UUIDPRIMARYKEYDEFAULTuuid_generate_v4(),-- Distributed across indexcreated_atTIMESTAMPDEFAULTNOW());
4.4 Optimistic Concurrency Control
Instead of acquiring locks early, use version checking and retry logic to handle concurrent modifications.
Version-Based Optimistic Locking
Using xmin for Version Control:
sql
-- Step 1: Read data with version infoSELECTid, name, balance, xminasversionFROMaccountsWHEREid=123;-- Returns: id=123, name="John", balance=1000, version=12345-- Step 2: Update with version checkUPDATEaccountsSETbalance=900, updated_at=NOW()WHEREid=123ANDxmin=12345;-- Only update if version matches-- Step 3: Check if update succeededGET DIAGNOSTICS rows_affected=ROW_COUNT;IFrows_affected=0THENRAISE EXCEPTION'Concurrent modification detected - please retry';ENDIF;
Application-Level Version Fields:
sql
-- Add version column to tablesALTERTABLEaccountsADDCOLUMNversion_numberINTEGERDEFAULT1;-- Update with version incrementUPDATEaccountsSETbalance=900,version_number=version_number+1,updated_at=NOW()WHEREid=123ANDversion_number=@expected_version;
Retry Logic Implementation:
python
importtimeimportrandomdefoptimistic_update(account_id, new_balance, max_retries=3):forattemptinrange(max_retries):try:# Read current statecursor.execute("""SELECT balance, version_numberFROM accounts WHERE id = %s""", [account_id])current_balance, current_version=cursor.fetchone()# Perform business logic validationifnew_balance<0:raiseValueError("Negative balance not allowed")# Attempt optimistic updatecursor.execute("""UPDATE accountsSET balance = %s, version_number = version_number + 1WHERE id = %s AND version_number = %s""", [new_balance, account_id, current_version])ifcursor.rowcount==0:# Concurrent modification detectedifattempt<max_retries-1:# Exponential backoff with jitterdelay=(2**attempt)+random.uniform(0,1)time.sleep(delay)continueelse:raiseConcurrentModificationError("Max retries exceeded")# SuccessreturnTrueexceptExceptionase:ifattempt==max_retries-1:raisecontinuereturnFalse
4.5 Advisory Locks for Custom Synchronization
PostgreSQL’s advisory locks provide application-level synchronization without interfering with data locks.
Basic Advisory Lock Usage
Session-Level Advisory Locks:
sql
-- Acquire lock (blocks if already held)SELECTpg_advisory_lock(12345);-- Try to acquire lock (returns immediately)SELECTpg_try_advisory_lock(12345);-- Returns true if acquired, false if not-- Release lockSELECTpg_advisory_unlock(12345);-- Check current advisory locksSELECTlocktype, classid, objid,mode, grantedFROMpg_locksWHERElocktype='advisory';
Transaction-Level Advisory Locks:
sql
BEGIN;-- Lock is automatically released at transaction endSELECTpg_advisory_xact_lock(67890);-- Perform operations that need coordinationUPDATEglobal_countersSETvalue=value+1WHEREname='order_sequence';COMMIT;-- Lock automatically released
Practical Advisory Lock Patterns
Singleton Process Enforcement:
sql
-- Ensure only one instance of a batch job runsCREATEORREPLACEFUNCTIONrun_daily_reports()RETURNSVOIDAS$$BEGIN-- Try to acquire exclusive lock for this jobIFNOTpg_try_advisory_lock(hashtext('daily_reports'))THENRAISE NOTICE'Daily reports job already running, skipping...';RETURN;ENDIF;-- Perform the workRAISE NOTICE'Starting daily reports generation...';-- Generate reports hereINSERTINTOdaily_sales_reportSELECT...;UPDATEreport_statusSETlast_run=NOW()WHEREreport_name='daily_sales';RAISE NOTICE'Daily reports completed';-- Lock is automatically released when function endsPERFORM pg_advisory_unlock(hashtext('daily_reports'));END;$$LANGUAGEplpgsql;
Resource Pool Management:
sql
-- Implement a custom semaphore for limiting concurrent operationsCREATEORREPLACEFUNCTIONacquire_processing_slot(max_slotsINTDEFAULT5)RETURNSINTAS$$DECLAREslot_idINT;BEGIN-- Try to acquire one of N available slotsFORslot_idIN1..max_slotsLOOPIFpg_try_advisory_lock(hashtext('processing_slot'), slot_id)THENRETURNslot_id;-- Successfully acquired slotENDIF;ENDLOOP;RETURN0;-- No slots availableEND;$$LANGUAGEplpgsql;-- Usage in applicationDO$$DECLAREmy_slotINT;BEGINmy_slot :=acquire_processing_slot(3);-- Max 3 concurrent processesIFmy_slot>0THENRAISE NOTICE'Acquired processing slot %', my_slot;-- Do heavy processing workPERFORM heavy_data_processing();-- Release slotPERFORM pg_advisory_unlock(hashtext('processing_slot'), my_slot);ELSERAISE NOTICE'No processing slots available, deferring work';ENDIF;END;$$;
Ordered Queue Processing:
sql
-- Process queue items in strict order using advisory locksCREATEORREPLACEFUNCTIONprocess_queue_item()RETURNSVOIDAS$$DECLAREqueue_item RECORD;lock_keyBIGINT;BEGIN-- Find next item to processSELECT*INTOqueue_itemFROMjob_queueWHEREstatus='pending'ORDERBYpriorityDESC, created_atASCLIMIT1;IFqueue_itemISNULLTHENRETURN;-- No work to doENDIF;-- Create unique lock key for this itemlock_key :=hashtext('queue_item_'||queue_item.id::TEXT);-- Try to acquire exclusive lock for this specific itemIFNOTpg_try_advisory_lock(lock_key)THENRETURN;-- Another worker is processing this itemENDIF;BEGIN-- Mark as processingUPDATEjob_queueSETstatus='processing', started_at=NOW()WHEREid=queue_item.id;-- Perform the actual workPERFORM execute_job(queue_item.job_type, queue_item.job_data);-- Mark as completedUPDATEjob_queueSETstatus='completed', completed_at=NOW()WHEREid=queue_item.id;EXCEPTIONWHENOTHERSTHEN-- Mark as failedUPDATEjob_queueSETstatus='failed', error_message=SQLERRMWHEREid=queue_item.id;RAISE;-- Re-raise the exceptionEND;-- Release the lockPERFORM pg_advisory_unlock(lock_key);END;$$LANGUAGEplpgsql;
5. Monitoring and Troubleshooting
Effective monitoring is essential for maintaining deadlock-free systems. This section covers comprehensive monitoring strategies, alerting, and troubleshooting techniques.
5.1 Real-Time Monitoring Queries
Comprehensive Lock Analysis View
sql
CREATEORREPLACEVIEWcomprehensive_lock_analysisASWITHlock_detailsAS(SELECTpl.pid,pl.locktype,pl.mode,pl.granted,pl.relation,pl.relation::regclassAStable_name,pl.page,pl.tuple,pl.virtualxid,pl.transactionid,pl.classid,pl.objid,pl.objsubid,psa.query,psa.state,psa.application_name,psa.client_addr,psa.usename,psa.datname,psa.backend_start,psa.xact_start,psa.query_start,psa.state_change,NOW()-psa.query_startASquery_duration,NOW()-psa.xact_startAStransaction_duration,psa.wait_event_type,psa.wait_eventFROMpg_locks plLEFTJOINpg_stat_activity psaONpl.pid=psa.pid),blocking_infoAS(SELECTblocked.pidASblocked_pid,blocked.locktypeASblocked_locktype,blocked.modeASblocked_mode,blocked.table_nameASblocked_table,blocking.pidASblocking_pid,blocking.modeASblocking_mode,blocking.queryASblocking_query,blocking.application_nameASblocking_app,blocking.transaction_durationASblocking_durationFROMlock_details blockedJOINlock_details blockingON(blocking.locktype=blocked.locktypeANDblocking.databaseISNOTDISTINCTFROMblocked.databaseANDblocking.relationISNOTDISTINCTFROMblocked.relationANDblocking.pageISNOTDISTINCTFROMblocked.pageANDblocking.tupleISNOTDISTINCTFROMblocked.tupleANDblocking.virtualxidISNOTDISTINCTFROMblocked.virtualxidANDblocking.transactionidISNOTDISTINCTFROMblocked.transactionidANDblocking.classidISNOTDISTINCTFROMblocked.classidANDblocking.objidISNOTDISTINCTFROMblocked.objidANDblocking.objsubidISNOTDISTINCTFROMblocked.objsubidANDblocking.pid!=blocked.pidANDblocking.granted=TRUEANDblocked.granted=FALSE))SELECTld.*,bi.blocking_pid,bi.blocking_mode,bi.blocking_query,bi.blocking_app,bi.blocking_duration,CASEWHENld.granted=FALSETHEN'WAITING'WHENbi.blocking_pidISNOTNULLTHEN'BLOCKING'ELSE'ACTIVE'ENDASlock_statusFROMlock_details ldLEFTJOINblocking_info biONld.pid=bi.blocked_pidORDERBYCASEWHENld.granted=FALSETHEN1ELSE2END,ld.query_durationDESC;
Lock Wait Chain Analysis
sql
-- Recursive CTE to find complete lock wait chainsWITHRECURSIVE lock_chainAS(-- Base case: find root blockers (blocking but not blocked)SELECTblocking.pidASblocker_pid,blocked.pidASblocked_pid,blocking.queryASblocker_query,blocked.queryASblocked_query,blocking.application_nameASblocker_app,blocked.application_nameASblocked_app,1ASchain_depth,ARRAY[blocking.pid]ASchain_pids,blocking.pid::TEXTASchain_pathFROMcomprehensive_lock_analysis blockingJOINcomprehensive_lock_analysis blockedONblocking.pid=blocked.blocking_pidWHEREblocking.blocking_pidISNULL-- Root blockerUNIONALL-- Recursive case: extend chainsSELECTlc.blocker_pid,cla.pidASblocked_pid,lc.blocker_query,cla.queryASblocked_query,lc.blocker_app,cla.application_nameASblocked_app,lc.chain_depth+1,lc.chain_pids||cla.pid,lc.chain_path||' -> '||cla.pid::TEXTFROMlock_chain lcJOINcomprehensive_lock_analysis claONlc.blocked_pid=cla.blocking_pidWHEREcla.pid!=ALL(lc.chain_pids)-- Prevent infinite loopsANDlc.chain_depth<10-- Reasonable depth limit)SELECTblocker_pid,blocked_pid,chain_depth,chain_path,blocker_query,blocked_query,blocker_app,blocked_appFROMlock_chainORDERBYblocker_pid, chain_depth;
Historical Lock Statistics
sql
-- Create table to store lock statistics over timeCREATETABLEIFNOTEXISTSlock_statistics (recorded_atTIMESTAMPWITHTIMEZONEDEFAULTNOW(),locktypeTEXT,modeTEXT,total_locksINTEGER,waiting_locksINTEGER,avg_wait_duration_msNUMERIC,max_wait_duration_msNUMERIC,blocked_sessionsINTEGER,blocking_sessionsINTEGER);
Automated Lock Statistics Collection
-- Function to collect and store lock statisticsCREATE OR REPLACE FUNCTION collect_lock_statistics() RETURNS VOID AS $$BEGININSERT INTO lock_statistics (locktype, mode, total_locks, waiting_locks,avg_wait_duration_ms, max_wait_duration_ms,blocked_sessions, blocking_sessions)SELECTpl.locktype,pl.mode,COUNT(*) as total_locks,COUNT(*) FILTER (WHERE NOT pl.granted) as waiting_locks,AVG(EXTRACT(EPOCH FROM (NOW() - psa.query_start)) * 1000)FILTER (WHERE NOT pl.granted) as avg_wait_duration_ms,MAX(EXTRACT(EPOCH FROM (NOW() - psa.query_start)) * 1000)FILTER (WHERE NOT pl.granted) as max_wait_duration_ms,COUNT(DISTINCT pl.pid) FILTER (WHERE NOT pl.granted) as blocked_sessions,COUNT(DISTINCT blocking.pid) as blocking_sessionsFROM pg_locks plLEFT JOIN pg_stat_activity psa ON pl.pid = psa.pidLEFT JOIN (SELECT DISTINCT blocking_pid as pidFROM comprehensive_lock_analysisWHERE blocking_pid IS NOT NULL) blocking ON pl.pid = blocking.pidGROUP BY pl.locktype, pl.modeHAVING COUNT(*) FILTER (WHERE NOT pl.granted) > 0OR COUNT(DISTINCT blocking.pid) > 0;END;$$ LANGUAGE plpgsql;-- Schedule collection every minuteSELECT cron.schedule('collect-lock-stats', '* * * * *', 'SELECT collect_lock_statistics()');
Deadlock History Analysis
-- Parse deadlock information from PostgreSQL logsCREATE TABLE IF NOT EXISTS deadlock_incidents (id SERIAL PRIMARY KEY,occurred_at TIMESTAMP WITH TIME ZONE,process_1_pid INTEGER,process_1_query TEXT,process_1_application TEXT,process_2_pid INTEGER,process_2_query TEXT,process_2_application TEXT,victim_pid INTEGER,involved_tables TEXT[],lock_types TEXT[],resolution_time_ms INTEGER,log_entry TEXT);-- Function to analyze deadlock patternsCREATE OR REPLACE FUNCTION analyze_deadlock_patterns(days_back INTEGER DEFAULT 7) RETURNS TABLE (pattern_type TEXT,frequency INTEGER,avg_resolution_time_ms NUMERIC,involved_tables TEXT[],common_applications TEXT[]) AS $$BEGINRETURN QUERYWITH deadlock_analysis AS (SELECTCASEWHEN array_length(involved_tables, 1) = 1 THEN 'Single Table'WHEN array_length(involved_tables, 1) = 2 THEN 'Two Table'ELSE 'Multi Table'END as pattern_type,resolution_time_ms,involved_tables,ARRAY[process_1_application, process_2_application] as applicationsFROM deadlock_incidentsWHERE occurred_at >= NOW() - (days_back || ' days')::INTERVAL)SELECTda.pattern_type,COUNT(*)::INTEGER as frequency,AVG(da.resolution_time_ms) as avg_resolution_time_ms,array_agg(DISTINCT unnest) as involved_tables,array_agg(DISTINCT unnest) as common_applicationsFROM deadlock_analysis da,unnest(da.involved_tables) as involved_table_unnest,unnest(da.applications) as app_unnestGROUP BY da.pattern_typeORDER BY frequency DESC;END;$$ LANGUAGE plpgsql;
5.2 Alerting and Notification Systems
Real-Time Deadlock Alerts
-- Create alert function for immediate deadlock detectionCREATE OR REPLACE FUNCTION send_deadlock_alert(alert_message TEXT,severity TEXT DEFAULT 'HIGH') RETURNS VOID AS $$DECLAREwebhook_url TEXT := 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL';payload JSON;BEGIN-- Construct alert payloadpayload := json_build_object('text', alert_message,'username', 'PostgreSQL Monitor','channel', '#alerts','attachments', json_build_array(json_build_object('color', CASEWHEN severity = 'CRITICAL' THEN 'danger'WHEN severity = 'HIGH' THEN 'warning'ELSE 'good'END,'fields', json_build_array(json_build_object('title', 'Database','value', current_database(),'short', true),json_build_object('title', 'Timestamp','value', NOW()::TEXT,'short', true)))));-- Send webhook notification (requires http extension)PERFORM http_post(webhook_url, payload::TEXT, 'application/json');-- Also log to PostgreSQL logRAISE WARNING 'DEADLOCK ALERT [%]: %', severity, alert_message;EXCEPTION WHEN OTHERS THEN-- If webhook fails, at least log locallyRAISE WARNING 'Failed to send deadlock alert: % - Original message: %', SQLERRM, alert_message;END;$$ LANGUAGE plpgsql;-- Monitoring function for long-running lock waitsCREATE OR REPLACE FUNCTION monitor_lock_waits() RETURNS VOID AS $$DECLARElong_wait_threshold INTERVAL := '30 seconds';critical_wait_threshold INTERVAL := '2 minutes';wait_record RECORD;alert_message TEXT;BEGINFOR wait_record INSELECTpid,application_name,query,locktype,mode,table_name,query_duration,blocking_pid,blocking_query,blocking_appFROM comprehensive_lock_analysisWHERE lock_status = 'WAITING'AND query_duration > long_wait_thresholdLOOPalert_message := format('Lock wait detected: PID %s (%s) waiting %s for %s lock on %s. Blocked by PID %s (%s)',wait_record.pid,wait_record.application_name,wait_record.query_duration,wait_record.mode,COALESCE(wait_record.table_name::TEXT, wait_record.locktype),wait_record.blocking_pid,wait_record.blocking_app);IF wait_record.query_duration > critical_wait_threshold THENPERFORM send_deadlock_alert(alert_message, 'CRITICAL');ELSEPERFORM send_deadlock_alert(alert_message, 'HIGH');END IF;END LOOP;END;$$ LANGUAGE plpgsql;
Automated Deadlock Resolution
-- Function to automatically resolve certain types of deadlocksCREATE OR REPLACE FUNCTION auto_resolve_deadlocks() RETURNS TEXT[] AS $$DECLAREresolution_actions TEXT[] := ARRAY[]::TEXT[];deadlock_session RECORD;kill_query TEXT;BEGIN-- Find sessions that have been waiting too long and are safe to killFOR deadlock_session INSELECTpid,application_name,query,query_duration,transaction_duration,usenameFROM comprehensive_lock_analysisWHERE lock_status = 'WAITING'AND query_duration > INTERVAL '5 minutes'AND application_name NOT IN ('critical_app', 'admin_console') -- Protect critical appsAND query !~* '(backup|restore|reindex|vacuum)' -- Don't kill maintenanceORDER BY query_duration DESCLIMIT 5 -- Limit resolution actions per runLOOP-- Attempt to cancel the query first (graceful)PERFORM pg_cancel_backend(deadlock_session.pid);resolution_actions := resolution_actions ||format('Cancelled query for PID %s (app: %s, duration: %s)',deadlock_session.pid,deadlock_session.application_name,deadlock_session.query_duration);-- Wait a moment and check if it workedPERFORM pg_sleep(2);-- If still running, terminate the connectionIF EXISTS (SELECT 1 FROM pg_stat_activity WHERE pid = deadlock_session.pid) THENPERFORM pg_terminate_backend(deadlock_session.pid);resolution_actions := resolution_actions ||format('Terminated connection for PID %s', deadlock_session.pid);END IF;-- Log the actionRAISE WARNING 'Auto-resolved potential deadlock: PID %, Query: %',deadlock_session.pid,left(deadlock_session.query, 100);END LOOP;RETURN resolution_actions;END;$$ LANGUAGE plpgsql;
5.3 Performance Tuning for Deadlock Prevention
Connection Pool Optimization
-- Function to analyze connection pool efficiencyCREATE OR REPLACE FUNCTION analyze_connection_patterns() RETURNS TABLE (application_name TEXT,avg_connections INTEGER,max_connections INTEGER,avg_transaction_duration INTERVAL,long_transaction_count INTEGER,idle_in_transaction_count INTEGER,deadlock_involvement_rate NUMERIC) AS $$BEGINRETURN QUERYSELECTpsa.application_name,COUNT(*)::INTEGER as avg_connections,COUNT(*)::INTEGER as max_connections, -- This would need historical dataAVG(NOW() - psa.xact_start) as avg_transaction_duration,COUNT(*) FILTER (WHERE NOW() - psa.xact_start > INTERVAL '1 minute')::INTEGER as long_transaction_count,COUNT(*) FILTER (WHERE psa.state = 'idle in transaction')::INTEGER as idle_in_transaction_count,0.0 as deadlock_involvement_rate -- Would need to join with deadlock historyFROM pg_stat_activity psaWHERE psa.state NOT IN ('idle', '')GROUP BY psa.application_nameORDER BY long_transaction_count DESC, avg_transaction_duration DESC;END;$$ LANGUAGE plpgsql;-- Recommended connection pool settings based on analysisCREATE OR REPLACE FUNCTION recommend_pool_settings(app_name TEXT)RETURNS TABLE (setting_name TEXT,current_value TEXT,recommended_value TEXT,reasoning TEXT) AS $$DECLAREapp_stats RECORD;BEGINSELECT * INTO app_stats FROM analyze_connection_patterns() WHERE application_name = app_name;IF app_stats IS NULL THENRAISE EXCEPTION 'Application % not found', app_name;END IF;-- Pool size recommendationRETURN QUERY VALUES('max_pool_size','unknown',CASEWHEN app_stats.avg_connections > 50 THEN '20-30'WHEN app_stats.avg_connections > 20 THEN '10-15'ELSE '5-10'END,format('Based on %s average connections', app_stats.avg_connections));-- Connection lifetimeRETURN QUERY VALUES('max_connection_lifetime','unknown',CASEWHEN app_stats.avg_transaction_duration > INTERVAL '5 minutes' THEN '30 minutes'WHEN app_stats.avg_transaction_duration > INTERVAL '1 minute' THEN '15 minutes'ELSE '10 minutes'END,format('Based on %s average transaction duration', app_stats.avg_transaction_duration));-- Idle timeoutRETURN QUERY VALUES('idle_timeout','unknown',CASEWHEN app_stats.idle_in_transaction_count > 5 THEN '30 seconds'WHEN app_stats.idle_in_transaction_count > 2 THEN '60 seconds'ELSE '120 seconds'END,format('Based on %s idle-in-transaction connections', app_stats.idle_in_transaction_count));END;$$ LANGUAGE plpgsql;
Index Optimization for Deadlock Reduction
-- Analyze index contention and suggest optimizationsCREATE OR REPLACE FUNCTION analyze_index_contention()RETURNS TABLE (schema_name TEXT,table_name TEXT,index_name TEXT,index_size TEXT,index_scans BIGINT,index_tup_read BIGINT,index_tup_fetch BIGINT,contention_score NUMERIC,recommendation TEXT) AS $$BEGINRETURN QUERYSELECTschemaname::TEXT,tablename::TEXT,indexname::TEXT,pg_size_pretty(pg_relation_size(indexrelid)) as index_size,idx_scan as index_scans,idx_tup_read as index_tup_read,idx_tup_fetch as index_tup_fetch,-- Simple contention score based on read/fetch ratioCASEWHEN idx_tup_fetch = 0 THEN 0ELSE ROUND((idx_tup_read::NUMERIC / idx_tup_fetch::NUMERIC), 2)END as contention_score,CASEWHEN idx_scan = 0 THEN 'Consider dropping - unused index'WHEN idx_tup_read::NUMERIC / NULLIF(idx_tup_fetch, 0) > 100 THEN 'High contention - consider partitioning'WHEN pg_relation_size(indexrelid) > 1024*1024*100 THEN 'Large index - monitor for hotspots'ELSE 'Normal usage pattern'END as recommendationFROM pg_stat_user_indexes psiJOIN pg_indexes pi ON psi.schemaname = pi.schemanameAND psi.tablename = pi.tablenameAND psi.indexname = pi.indexnameWHERE idx_scan > 0 OR idx_tup_read > 0ORDER BY contention_score DESC NULLS LAST;END;$$ LANGUAGE plpgsql;-- Generate index optimization suggestionsCREATE OR REPLACE FUNCTION suggest_index_optimizations(target_table TEXT DEFAULT NULL) RETURNS TABLE (optimization_type TEXT,current_situation TEXT,suggested_action TEXT,expected_benefit TEXT) AS $$BEGIN-- Missing indexes for foreign keysRETURN QUERYSELECT'Missing FK Index'::TEXT,format('Table %s.%s, column %s', tc.table_schema, tc.table_name, kcu.column_name),format('CREATE INDEX idx_%s_%s ON %s.%s (%s);',tc.table_name, kcu.column_name, tc.table_schema, tc.table_name, kcu.column_name),'Reduce deadlocks on foreign key lookups'::TEXTFROM information_schema.table_constraints tcJOIN information_schema.key_column_usage kcu USING (constraint_name, table_schema, table_name)WHERE tc.constraint_type = 'FOREIGN KEY'AND (target_table IS NULL OR tc.table_name = target_table)AND NOT EXISTS (SELECT 1 FROM pg_indexesWHERE schemaname = tc.table_schemaAND tablename = tc.table_nameAND indexdef ~* ('\\(' || kcu.column_name || '\\)'));-- Partial indexes for common WHERE conditionsRETURN QUERYSELECT'Partial Index Opportunity'::TEXT,format('Frequent queries on %s with WHERE conditions', target_table),'CREATE INDEX idx_partial ON table (column) WHERE condition;'::TEXT,'Reduce index size and contention'::TEXTWHERE target_table IS NOT NULL;END;$$ LANGUAGE plpgsql;
6. Advanced Prevention Techniques
6.1 Database Design Patterns for Deadlock Prevention
Event Sourcing Pattern
-- Instead of updating records directly, append eventsCREATE TABLE account_events (id BIGSERIAL PRIMARY KEY,account_id INTEGER NOT NULL,event_type TEXT NOT NULL,amount DECIMAL(15,2),metadata JSONB,created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),sequence_number BIGSERIAL);-- Create index for efficient event replayCREATE INDEX idx_account_events_account_sequenceON account_events (account_id, sequence_number);-- Function to append events (no locks on existing data)CREATE OR REPLACE FUNCTION append_account_event(p_account_id INTEGER,p_event_type TEXT,p_amount DECIMAL DEFAULT NULL,p_metadata JSONB DEFAULT NULL) RETURNS BIGINT AS $$DECLAREevent_id BIGINT;BEGININSERT INTO account_events (account_id, event_type, amount, metadata)VALUES (p_account_id, p_event_type, p_amount, p_metadata)RETURNING id INTO event_id;-- Optionally update materialized view asynchronouslyPERFORM pg_notify('account_changed', p_account_id::TEXT);RETURN event_id;END;$$ LANGUAGE plpgsql;-- Materialized view for current state (updated asynchronously)CREATE MATERIALIZED VIEW account_balances ASSELECTaccount_id,SUM(CASEWHEN event_type = 'credit' THEN amountWHEN event_type = 'debit' THEN -amountELSE 0END) as current_balance,COUNT(*) as transaction_count,MAX(created_at) as last_transaction_atFROM account_eventsGROUP BY account_id;CREATE UNIQUE INDEX idx_account_balances_account_idON account_balances (account_id);
Queue-Based Processing
-- Serialize conflicting operations through queuesCREATE TABLE operation_queue (id BIGSERIAL PRIMARY KEY,resource_key TEXT NOT NULL, -- What resource this operation affectsoperation_type TEXT NOT NULL,operation_data JSONB NOT NULL,priority INTEGER DEFAULT 0,status TEXT DEFAULT 'pending',created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),started_at TIMESTAMP WITH TIME ZONE,completed_at TIMESTAMP WITH TIME ZONE,worker_id TEXT,retry_count INTEGER DEFAULT 0,error_message TEXT);CREATE INDEX idx_operation_queue_processingON operation_queue (resource_key, status, priority DESC, created_at);-- Worker function to process operations sequentially per resourceCREATE OR REPLACE FUNCTION process_operation_queue(worker_id TEXT,max_operations INTEGER DEFAULT 10) RETURNS INTEGER AS $$DECLAREoperation RECORD;operations_processed INTEGER := 0;BEGIN-- Process operations one resource at a time to avoid deadlocksFOR operation INSELECT DISTINCT ON (resource_key) *FROM operation_queueWHERE status = 'pending'ORDER BY resource_key, priority DESC, created_at ASCLIMIT max_operationsFOR UPDATE SKIP LOCKEDLOOP-- Mark as in progressUPDATE operation_queueSET status = 'processing',started_at = NOW(),worker_id = process_operation_queue.worker_idWHERE id = operation.id;BEGIN-- Process the operationCASE operation.operation_typeWHEN 'transfer' THENPERFORM execute_transfer(operation.operation_data);WHEN 'update_inventory' THENPERFORM update_inventory(operation.operation_data);ELSERAISE EXCEPTION 'Unknown operation type: %', operation.operation_type;END CASE;-- Mark as completedUPDATE operation_queueSET status = 'completed', completed_at = NOW()WHERE id = operation.id;operations_processed := operations_processed + 1;EXCEPTION WHEN OTHERS THEN-- Mark as failed and potentially retryUPDATE operation_queueSET status = CASEWHEN retry_count < 3 THEN 'pending'ELSE 'failed'END,retry_count = retry_count + 1,error_message = SQLERRMWHERE id = operation.id;END;END LOOP;RETURN operations_processed;END;$$ LANGUAGE plpgsql;
6.2 Application-Level Deadlock Prevention
Distributed Lock Manager
-- Distributed lock management using PostgreSQLCREATE TABLE distributed_locks (lock_name TEXT PRIMARY KEY,owner_id TEXT NOT NULL,acquired_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),expires_at TIMESTAMP WITH TIME ZONE NOT NULL,metadata JSONB,heartbeat_at TIMESTAMP WITH TIME ZONE DEFAULT NOW());CREATE INDEX idx_distributed_locks_expiresON distributed_locks (expires_at);-- Function to acquire a distributed lockCREATE OR REPLACE FUNCTION acquire_distributed_lock(p_lock_name TEXT,p_owner_id TEXT,p_timeout_seconds INTEGER DEFAULT 300,p_metadata JSONB DEFAULT NULL) RETURNS BOOLEAN AS $$DECLARElock_acquired BOOLEAN := FALSE;expiry_time TIMESTAMP WITH TIME ZONE;BEGINexpiry_time := NOW() + (p_timeout_seconds || ' seconds')::INTERVAL;-- Clean up expired locks firstDELETE FROM distributed_locksWHERE expires_at < NOW();BEGIN-- Try to acquire the lockINSERT INTO distributed_locks (lock_name, owner_id, expires_at, metadata)VALUES (p_lock_name, p_owner_id, expiry_time, p_metadata);lock_acquired := TRUE;EXCEPTION WHEN unique_violation THEN-- Lock already exists, check if we own itUPDATE distributed_locksSET expires_at = expiry_time,heartbeat_at = NOW(),metadata = COALESCE(p_metadata, metadata)WHERE lock_name = p_lock_nameAND owner_id = p_owner_id;lock_acquired := FOUND;END;RETURN lock_acquired;END;$$ LANGUAGE plpgsql;-- Function to release a distributed lockCREATE OR REPLACE FUNCTION release_distributed_lock(p_lock_name TEXT,p_owner_id TEXT) RETURNS BOOLEAN AS $$BEGINDELETE FROM distributed_locksWHERE lock_name = p_lock_nameAND owner_id = p_owner_id;RETURN FOUND;END;$$ LANGUAGE plpgsql;-- Heartbeat function to keep locks aliveCREATE OR REPLACE FUNCTION heartbeat_distributed_lock(p_lock_name TEXT,p_owner_id TEXT,p_extend_seconds INTEGER DEFAULT 300) RETURNS BOOLEAN AS $$BEGINUPDATE distributed_locksSET heartbeat_at = NOW(),expires_at = NOW() + (p_extend_seconds || ' seconds')::INTERVALWHERE lock_name = p_lock_nameAND owner_id = p_owner_idAND expires_at > NOW(); -- Only extend if not expiredRETURN FOUND;END;$$ LANGUAGE plpgsql;
6.3 Microservices and Deadlock Prevention
Saga Pattern Implementation
-- Saga orchestration for distributed transactionsCREATE TABLE sagas (id UUID PRIMARY KEY DEFAULT gen_random_uuid(),saga_type TEXT NOT NULL,status TEXT NOT NULL DEFAULT 'started',data JSONB NOT NULL,created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),completed_at TIMESTAMP WITH TIME ZONE,compensated_at TIMESTAMP WITH TIME ZONE);CREATE TABLE saga_steps (id UUID PRIMARY KEY DEFAULT gen_random_uuid(),saga_id UUID NOT NULL REFERENCES sagas(id),step_number INTEGER NOT NULL,step_name TEXT NOT NULL,step_type TEXT NOT NULL, -- 'action' or 'compensation'status TEXT NOT NULL DEFAULT 'pending',input_data JSONB,output_data JSONB,error_message TEXT,started_at TIMESTAMP WITH TIME ZONE,completed_at TIMESTAMP WITH TIME ZONE,UNIQUE(saga_id, step_number));-- Function to execute saga stepsCREATE OR REPLACE FUNCTION execute_saga_step(p_saga_id UUID,p_step_number INTEGER) RETURNS BOOLEAN AS $$DECLAREsaga_record RECORD;step_record RECORD;success BOOLEAN := FALSE;BEGIN-- Get saga and step informationSELECT * INTO saga_record FROM sagas WHERE id = p_saga_id;SELECT * INTO step_record FROM saga_stepsWHERE saga_id = p_saga_id AND step_number = p_step_number;IF saga_record IS NULL OR step_record IS NULL THENRETURN FALSE;END IF;-- Mark step as startedUPDATE saga_stepsSET status = 'running', started_at = NOW()WHERE id = step_record.id;BEGIN-- Execute the step based on step_nameCASE step_record.step_nameWHEN 'reserve_inventory' THENPERFORM reserve_inventory_saga(step_record.input_data);WHEN 'charge_payment' THENPERFORM charge_payment_saga(step_record.input_data);WHEN 'create_shipment' THENPERFORM create_shipment_saga(step_record.input_data);-- Compensation stepsWHEN 'release_inventory' THENPERFORM release_inventory_saga(step_record.input_data);WHEN 'refund_payment' THENPERFORM refund_payment_saga(step_record.input_data);WHEN 'cancel_shipment' THENPERFORM cancel_shipment_saga(step_record.input_data);ELSERAISE EXCEPTION 'Unknown saga step: %', step_record.step_name;END CASE;-- Mark step as completedUPDATE saga_stepsSET status = 'completed', completed_at = NOW()WHERE id = step_record.id;success := TRUE;EXCEPTION WHEN OTHERS THEN-- Mark step as failedUPDATE saga_stepsSET status = 'failed',error_message = SQLERRM,completed_at = NOW()WHERE id = step_record.id;-- Trigger compensationPERFORM compensate_saga(p_saga_id);success := FALSE;END;RETURN success;END;$$ LANGUAGE plpgsql;
7. Production Deployment Checklist
7.1 Configuration Validation
-- Function to validate deadlock prevention configurationCREATE OR REPLACE FUNCTION validate_deadlock_config()RETURNS TABLE (check_name TEXT,status TEXT,current_value TEXT,recommended_value TEXT,severity TEXT) AS $$BEGIN-- Check deadlock_timeoutRETURN QUERYSELECT'deadlock_timeout'::TEXT,CASEWHEN current_setting('deadlock_timeout')::INTERVAL > INTERVAL '2 seconds'THEN 'WARNING'ELSE 'OK'END,current_setting('deadlock_timeout'),'200ms - 1s'::TEXT,'MEDIUM'::TEXT;-- Check log_lock_waitsRETURN QUERYSELECT'log_lock_waits'::TEXT,CASEWHEN current_setting('log_lock_waits') = 'off'THEN 'WARNING'ELSE 'OK'END,current_setting('log_lock_waits'),'on'::TEXT,'HIGH'::TEXT;-- Check max_connections vs max_locks_per_transactionRETURN QUERYSELECT'lock_memory_sizing'::TEXT,CASEWHEN current_setting('max_connections')::INTEGER *current_setting('max_locks_per_transaction')::INTEGER > 100000THEN 'WARNING'ELSE 'OK'END,format('%s connections × %s locks = %s total',current_setting('max_connections'),current_setting('max_locks_per_transaction'),(current_setting('max_connections')::INTEGER *current_setting('max_locks_per_transaction')::INTEGER)::TEXT),'Consider reducing if memory limited'::TEXT,'LOW'::TEXT;-- Check statement_timeoutRETURN QUERYSELECT'statement_timeout'::TEXT,CASEWHEN current_setting('statement_timeout')::INTERVAL = INTERVAL '0'THEN 'WARNING'ELSE 'OK'END,current_setting('statement_timeout'),'30s - 300s depending on workload'::TEXT,'MEDIUM'::TEXT;END;$$ LANGUAGE plpgsql;
7.2 Deployment Monitoring Script
-- Create comprehensive monitoring dashboardCREATE OR REPLACE VIEW deadlock_prevention_dashboard ASSELECT'Configuration' as category,check_name as metric,status as value,severity as priority,current_value as detailsFROM validate_deadlock_config()UNION ALLSELECT'Current Locks' as category,lock_status as metric,COUNT(*)::TEXT as value,CASEWHEN lock_status = 'WAITING' AND COUNT(*) > 5 THEN 'HIGH'WHEN lock_status = 'WAITING' AND COUNT(*) > 0 THEN 'MEDIUM'ELSE 'LOW'END as priority,string_agg(DISTINCT table_name::TEXT, ', ') as detailsFROM comprehensive_lock_analysisGROUP BY lock_statusUNION ALLSELECT'Performance' as category,'Average Transaction Duration' as metric,ROUND(EXTRACT(EPOCH FROM AVG(NOW() - xact_start))::NUMERIC, 2)::TEXT || 's' as value,CASEWHEN AVG(NOW() - xact_start) > INTERVAL '1 minute' THEN 'HIGH'WHEN AVG(NOW() - xact_start) > INTERVAL '10 seconds' THEN 'MEDIUM'ELSE 'LOW'END as priority,COUNT(*)::TEXT || ' active transactions' as detailsFROM pg_stat_activityWHERE state = 'active' AND xact_start IS NOT NULLUNION ALLSELECT'Connection Health' as category,'Idle in Transaction' as metric,COUNT(*)::TEXT as value,CASEWHEN COUNT(*) > 10 THEN 'HIGH'WHEN COUNT(*) > 5 THEN 'MEDIUM'ELSE 'LOW'END as priority,string_agg(DISTINCT application_name, ', ') as detailsFROM pg_stat_activityWHERE state = 'idle in transaction'GROUP BY stateORDER BYCASE priorityWHEN 'HIGH' THEN 1WHEN 'MEDIUM' THEN 2WHEN 'LOW' THEN 3END,category, metric;-- Health check function for automated monitoringCREATE OR REPLACE FUNCTION deadlock_health_check()RETURNS TABLE (overall_status TEXT,critical_issues INTEGER,warnings INTEGER,recommendations TEXT[]) AS $$DECLAREcritical_count INTEGER := 0;warning_count INTEGER := 0;rec_array TEXT[] := ARRAY[]::TEXT[];dashboard_row RECORD;BEGIN-- Count issues from dashboardFOR dashboard_row IN SELECT * FROM deadlock_prevention_dashboard LOOPCASE dashboard_row.priorityWHEN 'HIGH' THENcritical_count := critical_count + 1;rec_array := rec_array || format('CRITICAL: %s - %s',dashboard_row.metric, dashboard_row.details);WHEN 'MEDIUM' THENwarning_count := warning_count + 1;rec_array := rec_array || format('WARNING: %s - %s',dashboard_row.metric, dashboard_row.details);END CASE;END LOOP;-- Return overall assessmentRETURN QUERY SELECTCASEWHEN critical_count > 0 THEN 'CRITICAL'WHEN warning_count > 3 THEN 'WARNING'ELSE 'HEALTHY'END::TEXT,critical_count,warning_count,rec_array;END;$$ LANGUAGE plpgsql;
7.3 Automated Testing Framework
-- Create deadlock simulation for testingCREATE OR REPLACE FUNCTION simulate_deadlock_scenario(scenario_name TEXT,concurrent_sessions INTEGER DEFAULT 2) RETURNS TEXT AS $$DECLAREresult TEXT;session_count INTEGER;BEGIN-- Validate scenarioIF scenario_name NOT IN ('two_table_transfer', 'foreign_key_insert', 'index_contention') THENRAISE EXCEPTION 'Unknown scenario: %', scenario_name;END IF;-- Check current loadSELECT COUNT(*) INTO session_countFROM pg_stat_activityWHERE state = 'active';IF session_count > 10 THENRETURN 'SKIPPED: High system load detected';END IF;-- Execute scenarioCASE scenario_nameWHEN 'two_table_transfer' THENresult := simulate_transfer_deadlock(concurrent_sessions);WHEN 'foreign_key_insert' THENresult := simulate_fk_deadlock(concurrent_sessions);WHEN 'index_contention' THENresult := simulate_index_deadlock(concurrent_sessions);END CASE;RETURN result;END;$$ LANGUAGE plpgsql;-- Two-table transfer deadlock simulationCREATE OR REPLACE FUNCTION simulate_transfer_deadlock(sessions INTEGER)RETURNS TEXT AS $$DECLAREstart_time TIMESTAMP;end_time TIMESTAMP;deadlock_detected BOOLEAN := FALSE;BEGIN-- Create test tables if they don't existCREATE TABLE IF NOT EXISTS test_accounts (id SERIAL PRIMARY KEY,balance DECIMAL(10,2) DEFAULT 1000.00,updated_at TIMESTAMP DEFAULT NOW());-- Ensure test dataINSERT INTO test_accounts (id, balance)VALUES (1, 1000.00), (2, 1000.00)ON CONFLICT (id) DO UPDATE SET balance = EXCLUDED.balance;start_time := clock_timestamp();BEGIN-- Simulate concurrent transfers in opposite directions-- This would normally require multiple connections-- For testing, we'll use advisory locks to simulate the patternIF pg_try_advisory_lock(1) THEN-- Session 1 simulationUPDATE test_accounts SET balance = balance - 100 WHERE id = 1;PERFORM pg_sleep(0.1); -- Simulate processing delayIF pg_try_advisory_lock(2) THENUPDATE test_accounts SET balance = balance + 100 WHERE id = 2;PERFORM pg_advisory_unlock(2);PERFORM pg_advisory_unlock(1);deadlock_detected := FALSE;ELSEPERFORM pg_advisory_unlock(1);deadlock_detected := TRUE;END IF;END IF;EXCEPTION WHEN deadlock_detected THENdeadlock_detected := TRUE;END;end_time := clock_timestamp();-- Clean up test dataDELETE FROM test_accounts WHERE id IN (1, 2);RETURN format('Simulation completed in %s ms. Deadlock detected: %s',EXTRACT(MILLISECONDS FROM (end_time - start_time))::INTEGER,deadlock_detected);END;$$ LANGUAGE plpgsql;
8. Emergency Response Procedures
8.1 Deadlock Incident Response Plan
-- Emergency deadlock resolution functionCREATE OR REPLACE FUNCTION emergency_deadlock_resolution(max_kill_count INTEGER DEFAULT 5) RETURNS TABLE (action_taken TEXT,pid INTEGER,application_name TEXT,query_duration INTERVAL,query_preview TEXT) AS $$DECLAREvictim RECORD;kill_count INTEGER := 0;BEGIN-- First, try graceful query cancellationFOR victim INSELECTpsa.pid,psa.application_name,NOW() - psa.query_start as duration,LEFT(psa.query, 100) as query_previewFROM pg_stat_activity psaJOIN comprehensive_lock_analysis cla ON psa.pid = cla.pidWHERE cla.lock_status = 'WAITING'AND psa.query_start < NOW() - INTERVAL '2 minutes'AND psa.application_name NOT LIKE '%admin%'ORDER BY psa.query_start ASCLIMIT max_kill_countLOOP-- Try to cancel the query firstIF pg_cancel_backend(victim.pid) THENkill_count := kill_count + 1;RETURN QUERY SELECT'CANCELLED'::TEXT,victim.pid,victim.application_name,victim.duration,victim.query_preview;-- Wait briefly and check if it workedPERFORM pg_sleep(1);-- If still running, terminate the connectionIF EXISTS (SELECT 1 FROM pg_stat_activity WHERE pid = victim.pid) THENPERFORM pg_terminate_backend(victim.pid);RETURN QUERY SELECT'TERMINATED'::TEXT,victim.pid,victim.application_name,victim.duration,victim.query_preview;END IF;END IF;IF kill_count >= max_kill_count THENEXIT;END IF;END LOOP;-- Log the emergency actionINSERT INTO deadlock_incidents (occurred_at,victim_pid,log_entry) VALUES (NOW(),NULL,format('Emergency resolution killed %s sessions', kill_count));IF kill_count = 0 THENRETURN QUERY SELECT'NO_ACTION'::TEXT,NULL::INTEGER,'No long-running waiting sessions found'::TEXT,NULL::INTERVAL,NULL::TEXT;END IF;END;$$ LANGUAGE plpgsql;
8.2 Post-Incident Analysis
-- Function to analyze deadlock incidents after resolutionCREATE OR REPLACE FUNCTION analyze_deadlock_incident(incident_start TIMESTAMP,incident_end TIMESTAMP DEFAULT NOW()) RETURNS TABLE (analysis_category TEXT,finding TEXT,impact_assessment TEXT,prevention_recommendation TEXT) AS $$BEGIN-- Analyze lock wait patterns during incidentRETURN QUERYSELECT'Lock Patterns'::TEXT,format('Peak waiting sessions: %s, Most affected table: %s',COUNT(*), mode()),CASEWHEN COUNT(*) > 20 THEN 'High impact - system-wide slowdown likely'WHEN COUNT(*) > 10 THEN 'Medium impact - affected multiple users'ELSE 'Low impact - isolated incident'END::TEXT,'Implement consistent lock ordering for affected tables'::TEXTFROM lock_statisticsWHERE recorded_at BETWEEN incident_start AND incident_endAND waiting_locks > 0;-- Analyze application patternsRETURN QUERYSELECT'Application Impact'::TEXT,format('Applications affected: %s', string_agg(DISTINCT application_name, ', ')),'Multiple applications involved - suggests systemic issue'::TEXT,'Review transaction boundaries across all applications'::TEXTFROM pg_stat_activityWHERE backend_start BETWEEN incident_start AND incident_end;-- Timeline analysisRETURN QUERYSELECT'Timeline'::TEXT,format('Incident duration: %s', incident_end - incident_start),CASEWHEN incident_end - incident_start > INTERVAL '5 minutes'THEN 'Extended outage - immediate action required'ELSE 'Brief incident - monitor for recurrence'END::TEXT,'Implement faster deadlock detection timeout'::TEXT;END;$$ LANGUAGE plpgsql;
9. Best Practices Summary
9.1 Development Guidelines
-- Code review checklist functionCREATE OR REPLACE FUNCTION deadlock_code_review_checklist()RETURNS TABLE (category TEXT,checkpoint TEXT,description TEXT,sql_example TEXT) AS $$BEGINRETURN QUERY VALUES('Lock Ordering', 'Consistent Resource Access','Always access resources (tables, rows) in the same order across all code paths','SELECT * FROM accounts WHERE id IN (1,2) ORDER BY id FOR UPDATE'),('Transaction Scope', 'Minimize Transaction Time','Keep transactions as short as possible, avoid I/O within transactions','BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = 1; COMMIT;'),('Error Handling', 'Deadlock Retry Logic','Implement exponential backoff retry for deadlock errors','EXCEPTION WHEN deadlock_detected THEN PERFORM pg_sleep(random() * power(2, attempt));'),('Locking Strategy', 'Use Appropriate Lock Levels','Use the least restrictive lock level that maintains data integrity','SELECT * FROM inventory WHERE product_id = ? FOR UPDATE NOWAIT'),('Index Design', 'Support Query Patterns','Ensure indexes support your query patterns to minimize lock duration','CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date)'),('Advisory Locks', 'Custom Synchronization','Use advisory locks for application-level synchronization instead of table locks','SELECT pg_advisory_lock(hashtext(''resource_'' || resource_id))');END;$$ LANGUAGE plpgsql;
9.2 Operations Guidelines
-- Operational monitoring checklistCREATE OR REPLACE FUNCTION operational_monitoring_checklist()RETURNS TABLE (frequency TEXT,check_name TEXT,action TEXT,alert_threshold TEXT) AS $$BEGINRETURN QUERY VALUES('Real-time', 'Active Lock Waits','Monitor pg_locks for waiting sessions', '> 5 waiting sessions'),('Every minute', 'Long Running Transactions','Check for transactions running longer than expected', '> 2 minutes'),('Every 5 minutes', 'Deadlock Detection Timeout','Verify deadlock_timeout is appropriate for workload', 'Current: 1s'),('Hourly', 'Lock Statistics Trends','Review lock_statistics table for patterns', 'Increasing wait times'),('Daily', 'Deadlock Incident Analysis','Review deadlock_incidents for patterns', '> 0 incidents'),('Weekly', 'Index Contention Analysis','Analyze index usage and contention patterns', 'High read/fetch ratio');END;$$ LANGUAGE plpgsql;
10. Performance Optimization Strategies
10.1 Connection Pool Tuning
-- Function to generate connection pool recommendationsCREATE OR REPLACE FUNCTION connection_pool_recommendations()RETURNS TABLE (pool_parameter TEXT,current_load_pattern TEXT,recommended_value TEXT,rationale TEXT) AS $$DECLAREavg_connections INTEGER;avg_transaction_time INTERVAL;idle_connections INTEGER;BEGIN-- Gather current metricsSELECT COUNT(*), AVG(NOW() - xact_start),COUNT(*) FILTER (WHERE state = 'idle')INTO avg_connections, avg_transaction_time, idle_connectionsFROM pg_stat_activityWHERE state != '';RETURN QUERY VALUES('max_pool_size',format('%s avg connections', avg_connections),CASEWHEN avg_connections > 100 THEN '50-75'WHEN avg_connections > 50 THEN '25-35'ELSE '10-20'END,'Based on current connection patterns and deadlock reduction'),('connection_timeout',format('%s avg transaction time', avg_transaction_time),CASEWHEN avg_transaction_time > INTERVAL '1 minute' THEN '60s'WHEN avg_transaction_time > INTERVAL '10 seconds' THEN '30s'ELSE '15s'END,'Prevent long-running transactions from holding connections'),('pool_mode',format('%s idle connections', idle_connections),CASEWHEN idle_connections > avg_connections * 0.3 THEN 'transaction'ELSE 'session'END,'Optimize based on connection reuse patterns');END;$$ LANGUAGE plpgsql;
10.2 Query Optimization for Deadlock Prevention
-- Function to identify queries that may cause deadlocksCREATE OR REPLACE FUNCTION identify_deadlock_prone_queries()RETURNS TABLE (query_pattern TEXT,frequency BIGINT,avg_duration_ms NUMERIC,tables_accessed TEXT[],lock_risk_score INTEGER,optimization_suggestion TEXT) AS $$BEGINRETURN QUERYWITH query_analysis AS (SELECTregexp_replace(query, '\s+', ' ', 'g') as normalized_query,COUNT(*) as frequency,AVG(total_exec_time) as avg_duration_ms,-- This is a simplified example - would need more complex parsingARRAY['unknown'] as tables_accessed,CASEWHEN query ~* 'UPDATE.*UPDATE' THEN 8 -- Multiple updatesWHEN query ~* 'SELECT.*FOR UPDATE.*ORDER BY' THEN 6 -- Ordered lockingWHEN query ~* 'INSERT.*SELECT.*JOIN' THEN 7 -- Complex insertsWHEN query ~* 'DELETE.*EXISTS' THEN 5 -- Existence checksELSE 2END as lock_risk_scoreFROM pg_stat_statementsWHERE calls > 10 -- Only frequently executed queriesGROUP BY query)SELECTLEFT(qa.normalized_query, 100)::TEXT,qa.frequency,qa.avg_duration_ms,qa.tables_accessed,qa.lock_risk_score,CASEWHEN qa.lock_risk_score >= 8 THEN 'Break into smaller transactions'WHEN qa.lock_risk_score >= 6 THEN 'Add consistent ordering clauses'WHEN qa.lock_risk_score >= 4 THEN 'Consider optimistic locking'ELSE 'Monitor for patterns'END::TEXTFROM query_analysis qaORDER BY qa.lock_risk_score DESC, qa.frequency DESCLIMIT 20;END;$$ LANGUAGE plpgsql;
11. Conclusion and Maintenance
11.1 Long-term Maintenance Strategy
-- Automated maintenance scheduleCREATE OR REPLACE FUNCTION schedule_deadlock_maintenance()RETURNS TEXT AS $$BEGIN-- Daily tasksPERFORM cron.schedule('deadlock-daily-cleanup', '0 2 * * *', $$DELETE FROM lock_statistics WHERE recorded_at < NOW() - INTERVAL '30 days';DELETE FROM deadlock_incidents WHERE occurred_at < NOW() - INTERVAL '90 days';REFRESH MATERIALIZED VIEW CONCURRENTLY account_balances;$$);-- Weekly analysisPERFORM cron.schedule('deadlock-weekly-analysis', '0 3 * * 0', $$INSERT INTO maintenance_log (task_name, results, created_at)SELECT 'weekly_deadlock_analysis',jsonb_agg(to_jsonb(analysis)) as results,NOW()FROM analyze_deadlock_patterns(7) analysis;$$);-- Monthly optimizationPERFORM cron.schedule('deadlock-monthly-optimization', '0 4 1 * *', $$VACUUM ANALYZE lock_statistics;VACUUM ANALYZE deadlock_incidents;REINDEX INDEX CONCURRENTLY idx_account_events_account_sequence;$$);RETURN 'Maintenance schedule configured successfully';END;$$ LANGUAGE plpgsql;
11.2 Success Metrics
Monitor these key metrics to measure the effectiveness of your deadlock prevention strategy:
- Deadlock Frequency: Target < 0.1 deadlocks per 1000 transactions
- Average Lock Wait Time: Target < 100ms for 95th percentile
- Transaction Duration: Target < 1 second for 99th percentile
- Long-running Transaction Count: Target < 5 transactions > 30 seconds
- System Recovery Time: Target < 30 seconds for deadlock resolution
-- Success metrics dashboardCREATE OR REPLACE VIEW deadlock_success_metrics ASWITH metrics AS (SELECT'Deadlock Frequency' as metric_name,COALESCE((SELECT COUNT(*) FROM deadlock_incidents WHERE occurred_at >= NOW() - INTERVAL '24 hours'),0)::NUMERIC / NULLIF((SELECT SUM(xact_commit + xact_rollback) FROM pg_stat_database WHERE datname = current_database()),0) * 1000 as current_value,0.1 as target_value,'per 1000 transactions' as unitUNION ALLSELECT'Average Lock Wait Time',COALESCE(AVG(avg_wait_duration_ms), 0),100.0,'milliseconds'FROM lock_statisticsWHERE recorded_at >= NOW() - INTERVAL '1 hour'UNION ALLSELECT'Long Running Transactions',COUNT(*)::NUMERIC,5.0,'transactions > 30s'FROM pg_stat_activityWHERE NOW() - xact_start > INTERVAL '30 seconds'AND state = 'active')SELECTmetric_name,ROUND(current_value, 2) as current_value,target_value,unit,CASEWHEN current_value <= target_value THEN 'PASS'WHEN current_value <= target_value * 1.5 THEN 'WARNING'ELSE 'FAIL'END as status,ROUND(((target_value - current_value) / target_value * 100), 1) as performance_pctFROM metrics;
This comprehensive guide provides a complete framework for PostgreSQL deadlock prevention in high-concurrency environments. Regular monitoring, proactive prevention strategies, and quick incident response will help maintain system reliability and performance while minimizing the impact of deadlocks on your critical business operations.
Remember that deadlock prevention is an ongoing process that requires continuous monitoring, analysis, and optimization based on your specific workload patterns and business requirements.
Leave a Reply