Introduction
Database reliability is critical for modern applications. A single database issue can cascade into application downtime, data loss, or degraded user experience. While PostgreSQL is renowned for its robustness, proper monitoring is essential to maintain optimal performance and prevent issues before they impact your users.
This comprehensive guide covers 50+ carefully designed Prometheus alerts that monitor every critical aspect of your PostgreSQL infrastructure, from basic availability to advanced performance metrics.
Why Comprehensive PostgreSQL Monitoring Matters
PostgreSQL databases require monitoring across multiple dimensions:
- Availability: Is the database accessible and responding?
- Performance: Are queries executing efficiently?
- Capacity: Are we approaching resource limits?
- Data Integrity: Is maintenance keeping the database healthy?
- Security: Are there unauthorized access attempts?
Without proper alerts, you might discover problems only after users report issues. With the right monitoring strategy, you can detect and resolve problems proactively.
Architecture Overview
Our monitoring solution uses:
- postgres_exporter: Collects metrics from PostgreSQL
- Prometheus: Stores metrics and evaluates alert rules
- Alert Manager: Routes alerts to appropriate channels
The 50+ alerts are organized into 11 categories, each targeting specific aspects of database health.
1. Availability Monitoring (3 Alerts)
Why It Matters
If your database is down, nothing else matters. Availability monitoring ensures you’re immediately notified of any connectivity issues.
Critical Alerts
PostgresExporterDown (Critical – 2m)
- Purpose: Detects when Prometheus cannot scrape metrics
- Why It Matters: Loss of metrics means blind spots in monitoring
- Action: Check exporter process, network connectivity, and Prometheus configuration
PostgreSQLDown (Critical – 1m)
- Purpose: Detects when the database is unreachable
- Why It Matters: Complete service outage
- Action: Check PostgreSQL service status, authentication, network, and logs
PostgreSQLRestarted (Warning – 5m)
- Purpose: Tracks recent database restarts
- Why It Matters: Unexpected restarts may indicate crashes or configuration issues
- Action: Review PostgreSQL logs for crash reasons, check for OOM kills
2. Connection Management (6 Alerts)
Why It Matters
Connection exhaustion is one of the most common causes of database outages. Too few available connections mean new users cannot connect, while connection leaks waste resources.
Key Alerts
PostgreSQLHighConnections (Warning – 80% threshold)
- Purpose: Early warning before connection exhaustion
- Threshold: 80% of max_connections used for 3 minutes
- Action: Review connection pooling configuration, identify connection-heavy applications
PostgreSQLConnectionsExhausted (Critical – 95% threshold)
- Purpose: Imminent connection exhaustion
- Threshold: 95% of max_connections used for 1 minute
- Action: Immediate action needed – increase max_connections or kill idle connections
PostgreSQLIdleInTransaction (Warning)
- Purpose: Detects connections holding transactions open while idle
- Threshold: More than 5 idle-in-transaction connections for 5 minutes
- Why It Matters: Blocks VACUUM, holds locks, wastes resources
- Action: Find and fix application code that doesn’t commit/rollback properly
PostgreSQLConnectionLeaks (Warning)
- Purpose: Identifies potential connection pool leaks
- Threshold: Single user has more than 20 idle connections for 30 minutes
- Action: Review application connection pool configuration, check for connection leaks in code
PostgreSQLTooManyConnections (Warning)
- Purpose: Per-database connection monitoring
- Threshold: More than 100 connections to a single database
- Action: Investigate why one database has excessive connections
PostgreSQLBackendProcesses (Warning)
- Purpose: Monitors total active backend processes
- Threshold: More than 100 active processes for 10 minutes
- Why It Matters: High process count can indicate performance problems
- Action: Identify slow queries, review query patterns
Best Practices
- Use connection pooling (PgBouncer, pgpool-II)
- Set appropriate
max_connectionsbased on workload - Configure
idle_in_transaction_session_timeout - Monitor per-application connection usage
3. Query Performance (7 Alerts)
Why It Matters
Slow queries and long-running transactions are primary causes of database performance degradation. They consume resources, block other queries, and degrade user experience.
Transaction Monitoring
PostgreSQLLongRunningTransactions (Warning – 10 minutes)
- Purpose: Detects transactions running longer than expected
- Why It Matters: Long transactions hold locks and prevent VACUUM
- Action: Identify the query, check if it’s expected behavior, consider killing if necessary
PostgreSQLLongRunningTransactionsCritical (Critical – 1 hour)
- Purpose: Critical threshold for transaction duration
- Why It Matters: Likely blocking other operations, preventing maintenance
- Action: Immediate investigation and likely termination required
PostgreSQLLongRunningQueries (Warning – 5 minutes)
- Purpose: Monitors actively executing queries
- Action: Review query execution plan, add indexes, optimize query
Query Rate Monitoring
PostgreSQLSlowQueries (Warning)
- Purpose: Tracks average query execution time
- Threshold: Average execution time over 5 seconds for 3 minutes
- Action: Identify slow queries using pg_stat_statements, optimize or add indexes
PostgreSQLHighQueryRate (Warning)
- Purpose: Detects unusually high query throughput
- Threshold: More than 10,000 queries/second
- Why It Matters: May indicate a traffic spike or runaway process
- Action: Check application logs, identify source of queries
PostgreSQLHighRollbackRate (Warning)
- Purpose: Monitors transaction rollback percentage
- Threshold: More than 5% of transactions rolling back
- Why It Matters: Indicates application errors or deadlock issues
- Action: Review application logs, check for deadlocks or constraint violations
PostgreSQLQueryCancellation (Warning)
- Purpose: Tracks queries cancelled due to conflicts
- Why It Matters: Indicates replication conflicts or statement timeout issues
- Action: Review conflict reasons, adjust
max_standby_streaming_delay
Optimization Tips
- Use
pg_stat_statementsextension for query analysis - Set appropriate
statement_timeoutvalues - Implement query result caching
- Use EXPLAIN ANALYZE to understand query plans
- Consider query optimization or adding indexes
4. Cache & Buffer Performance (6 Alerts)
Why It Matters
Effective caching is crucial for database performance. Poor cache hit ratios mean excessive disk I/O, while excessive temp file usage indicates insufficient memory allocation.
PostgreSQLLowCacheHitRatio (Warning – <90%)
- Purpose: Monitors shared buffer cache effectiveness
- Threshold: Cache hit ratio below 90% for 10 minutes
- Why It Matters: Low hit ratio means frequent disk reads, poor performance
- Action: Consider increasing
shared_buffers, analyze query patterns
PostgreSQLHighDiskIO (Warning)
- Purpose: Tracks disk read rate
- Threshold: More than 1,000 blocks/second read from disk
- Action: Investigate queries causing high I/O, optimize or cache results
PostgreSQLTooManyTempFiles (Warning)
- Purpose: Monitors temporary file creation rate
- Threshold: More than 10 temp files/second
- Why It Matters: Indicates insufficient
work_memfor sorts/hashes - Action: Increase
work_memor optimize queries to reduce sorting
PostgreSQLTempFileUsage (Warning)
- Purpose: Tracks volume of temporary file usage
- Threshold: More than 100MB/second in temp files
- Action: Increase
work_mem, optimize queries, add indexes
PostgreSQLWriteActivity (Warning)
- Purpose: Monitors backend direct writes vs checkpoint writes
- Threshold: Backend writes exceed 2x checkpoint writes for 15 minutes
- Why It Matters: Indicates background writer isn’t keeping up
- Action: Tune
bgwriter_delay,bgwriter_lru_maxpages,bgwriter_lru_multiplier
PostgreSQLMaxWrittenClean (Warning)
- Purpose: Detects when background writer hits its write limit
- Why It Matters: Background writer stops prematurely, causing backend writes
- Action: Increase
bgwriter_lru_maxpages
Configuration Recommendations
-- Typical production settings
shared_buffers = 25% of RAM (up to 8-16GB)
work_mem = 32-256MB (depends on query complexity)
maintenance_work_mem = 1-2GB
effective_cache_size = 75% of RAM
5. Locks & Deadlocks (3 Alerts)
Why It Matters
Lock contention and deadlocks are critical issues that can bring your application to a halt. These alerts help identify and resolve concurrency problems quickly.
PostgreSQLDeadlocks (Critical)
- Purpose: Immediate notification of deadlock occurrence
- Threshold: Any deadlock detected within 1 minute
- Why It Matters: Deadlocks cause transaction failures
- Action: Review deadlock logs, redesign transaction order, add retry logic
PostgreSQLBlockedQueries (Warning)
- Purpose: Identifies queries waiting on locks
- Threshold: More than 10 queries waiting on exclusive locks for 2 minutes
- Why It Matters: Indicates lock contention, potential performance degradation
- Action: Identify blocking queries using
pg_locksandpg_stat_activity
PostgreSQLLockTablesFull (Warning)
- Purpose: Monitors lock table capacity
- Threshold: Lock table usage above 80%
- Why It Matters: Exhausting lock table prevents new locks
- Action: Increase
max_locks_per_transaction
Investigating Locks
-- Find blocking queries
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_query,
blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;
6. Storage & WAL Management (9 Alerts)
Why It Matters
Storage issues can cause database corruption, outages, or performance degradation. Monitoring disk usage, bloat, and WAL generation is essential for database health.
Disk Usage Alerts
PostgreSQLDiskUsageHigh (Warning)
- Purpose: Tracks database size growth
- Threshold: Database exceeds 100GB
- Action: Plan for storage expansion, archive old data
PostgreSQLDatabaseSizeGrowth (Warning)
- Purpose: Monitors growth rate
- Threshold: Growing more than 10GB per day for 2 hours
- Action: Investigate sudden growth, review data retention policies
PostgreSQLTableSizeGrowth (Info)
- Purpose: Tracks table insert rates
- Threshold: More than 1 million inserts per day
- Action: Monitor for capacity planning
Table Bloat Alerts
PostgreSQLTableBloat (Warning – 20%)
- Purpose: Detects moderate table bloat
- Threshold: More than 20% dead tuples for 30 minutes
- Why It Matters: Bloat wastes disk space and slows queries
- Action: Run VACUUM or schedule more aggressive autovacuum
PostgreSQLTableBloatCritical (Critical – 50%)
- Purpose: Critical bloat level
- Threshold: More than 50% dead tuples for 15 minutes
- Action: Immediate VACUUM FULL may be needed (requires table lock)
PostgreSQLIndexBloat (Warning)
- Purpose: Identifies large unused indexes
- Threshold: Index over 100MB with no scans in 7 days
- Action: Consider dropping unused indexes to save space
WAL Management
PostgreSQLWALGrowthHigh (Warning)
- Purpose: Monitors WAL generation rate
- Threshold: More than 100MB/second WAL generation
- Why It Matters: High WAL generation can overwhelm archiving/replication
- Action: Investigate bulk operations, tune checkpoint settings
PostgreSQLTooManyWALFiles (Warning)
- Purpose: Detects WAL file accumulation
- Threshold: More than 100 WAL files exist
- Why It Matters: Indicates archiving problems or replication lag
- Action: Check archive_command, verify replication slots
Sequence Management
PostgreSQLSequenceExhaustion (Warning – 80%)
- Purpose: Prevents sequence exhaustion
- Threshold: Sequence is 80% consumed for 1 hour
- Action: Plan sequence migration to BIGINT if needed
PostgreSQLSequenceExhaustionCritical (Critical – 95%)
- Purpose: Urgent sequence exhaustion warning
- Action: Immediate migration required
PostgreSQLUnusedIndexes (Info)
- Purpose: Identifies optimization opportunities
- Threshold: Index over 10MB never used in 7 days
- Action: Analyze if index is truly needed, consider dropping
7. Replication Monitoring (4 Alerts)
Why It Matters
For high-availability setups, replication lag can lead to stale reads, data loss during failover, or replication slot disk exhaustion.
PostgreSQLReplicationLag (Warning – 30 seconds)
- Purpose: Detects minor replication delays
- Why It Matters: Replicas serve stale data
- Action: Check network, master load, replica resources
PostgreSQLReplicationLagCritical (Critical – 5 minutes)
- Purpose: Critical replication delay
- Why It Matters: Significant data staleness, failover risk
- Action: Immediate investigation of replication bottleneck
PostgreSQLReplicationSlotInactive (Warning)
- Purpose: Detects inactive replication slots
- Threshold: Slot inactive for 5 minutes
- Why It Matters: Inactive slots can fill disk with WAL files
- Action: Verify replica status, remove stale slots if needed
PostgreSQLReplicationSlotLag (Warning)
- Purpose: Monitors WAL retention for slots
- Threshold: Slot has more than 1GB of WAL lag
- Action: Investigate why slot is falling behind
Replication Best Practices
- Monitor replication lag on all replicas
- Set appropriate
max_wal_sendersandmax_replication_slots - Use physical slots for streaming replication
- Implement monitoring on both primary and replicas
- Test failover procedures regularly
8. Vacuum & Maintenance (6 Alerts)
Why It Matters
VACUUM is PostgreSQL’s housekeeping mechanism. Without proper vacuuming, you’ll experience bloat, performance degradation, and eventually transaction ID wraparound failure.
VACUUM Monitoring
PostgreSQLTableNotVacuumed (Warning – 24 hours)
- Purpose: Ensures regular vacuuming occurs
- Why It Matters: Unvacuumed tables accumulate dead tuples
- Action: Check autovacuum settings, run manual VACUUM if needed
PostgreSQLTooManyDeadTuples (Warning – 10%)
- Purpose: Tracks dead tuple accumulation
- Threshold: More than 10% dead tuples for 10 minutes
- Why It Matters: Indicates autovacuum falling behind
- Action: Tune autovacuum parameters or run manual VACUUM
Statistics Maintenance
PostgreSQLTableNotAnalyzed (Warning)
- Purpose: Detects outdated table statistics
- Threshold: Not analyzed in 24 hours with 1,000+ modifications
- Why It Matters: Outdated stats lead to poor query plans
- Action: Run ANALYZE on affected tables
PostgreSQLOutdatedStatistics (Critical)
- Purpose: Critical statistics staleness
- Threshold: Not analyzed in 3 days with 5,000+ modifications
- Why It Matters: Query planner uses severely outdated information
- Action: Immediate ANALYZE needed, review autovacuum_analyze settings
Transaction ID Management
PostgreSQLTransactionIDWraparound (Critical)
- Purpose: Prevents catastrophic wraparound shutdown
- Threshold: Transaction age exceeds 1.5 billion
- Why It Matters: PostgreSQL will shut down to prevent data loss at 2 billion
- Action: Emergency VACUUM FREEZE required on affected database
PostgreSQLPreparedTransactionAge (Critical)
- Purpose: Detects old prepared transactions
- Threshold: Prepared transaction older than 1 hour
- Why It Matters: Blocks VACUUM from reclaiming space
- Action: Commit or rollback the prepared transaction
Autovacuum Configuration
-- Aggressive autovacuum settings for busy databases
autovacuum_vacuum_scale_factor = 0.05
autovacuum_vacuum_cost_limit = 1000
autovacuum_vacuum_cost_delay = 10ms
autovacuum_naptime = 10s
autovacuum_max_workers = 4
9. Configuration Management (2 Alerts)
PostgreSQLConfigurationChanged (Info)
- Purpose: Tracks pending configuration changes
- Threshold: Changes pending restart for 1 hour
- Action: Plan maintenance window for restart
PostgreSQLSharedBuffersTooLow (Info)
- Purpose: Identifies suboptimal memory configuration
- Threshold: shared_buffers less than 128MB
- Action: Consider increasing for better performance
10. Security Monitoring (2 Alerts)
Why It Matters
Security alerts help detect unauthorized access attempts, brute force attacks, or misuse of privileged accounts.
PostgreSQLTooManyAuthenticationFailures (Warning)
- Purpose: Detects potential brute force attacks
- Threshold: More than 5 authentication failures per second
- Action: Review logs, block suspicious IPs, strengthen authentication
PostgreSQLSuperuserConnections (Info)
- Purpose: Monitors superuser account usage
- Threshold: More than 5 concurrent postgres superuser connections
- Why It Matters: Superuser should be used sparingly
- Action: Review why superuser access is needed, use regular users when possible
11. System Health (3 Alerts)
PostgreSQLHighErrorRate (Warning)
- Purpose: Detects elevated error rates
- Threshold: Rollback rate exceeds 10% for 5 minutes
- Action: Review application logs for errors
PostgreSQLCheckpointsTooFrequent (Warning)
- Purpose: Identifies checkpoint tuning opportunities
- Threshold: Requested checkpoints exceed timed checkpoints
- Why It Matters: Frequent checkpoints cause I/O spikes
- Action: Increase
checkpoint_timeoutormax_wal_size
PostgreSQLCheckpointDuration (Warning)
- Purpose: Monitors checkpoint performance
- Threshold: Checkpoints taking more than 300 seconds on average
- Why It Matters: Long checkpoints cause performance degradation
- Action: Spread checkpoint I/O with
checkpoint_completion_target = 0.9
Implementation Guide
Step 1: Install postgres_exporter
# Using Docker
docker run -d \
--name postgres-exporter \
-e DATA_SOURCE_NAME="postgresql://user:password@localhost:5432/postgres?sslmode=disable" \
-p 9187:9187 \
prometheuscommunity/postgres-exporter
# Or install directly
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.15.0/postgres_exporter-0.15.0.linux-amd64.tar.gz
tar xvfz postgres_exporter-*.tar.gz
cd postgres_exporter-*/
./postgres_exporter
Step 2: Configure Prometheus
Add to your prometheus.yml:
scrape_configs:
- job_name: 'postgres-exporter'
static_configs:
- targets: ['localhost:9187']
labels:
instance: 'production-db-1'
Step 3: Add Alert Rules
Create postgres_alerts.yml with the provided rules and reference it in Prometheus configuration:
rule_files:
- "postgres_alerts.yml"
Step 4: Configure Alertmanager
Setup alert routing in alertmanager.yml:
route:
group_by: ['alertname', 'instance']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'team-database'
routes:
- match:
severity: critical
receiver: 'pagerduty'
- match:
severity: warning
receiver: 'slack'
receivers:
- name: 'pagerduty'
pagerduty_configs:
- service_key: '<your-key>'
- name: 'slack'
slack_configs:
- api_url: '<your-webhook>'
channel: '#database-alerts'
Step 5: Test Your Alerts
# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload
# Trigger a test alert
# For example, create many connections to trigger PostgreSQLHighConnections
Alert Tuning Guidelines
Adjust Thresholds for Your Environment
The default thresholds work for many environments but may need adjustment:
High-Traffic Databases
- Increase connection thresholds if you legitimately use 80%+ of connections
- Adjust query rate alerts for your normal traffic patterns
- Consider more aggressive bloat thresholds
Development Environments
- Relax some warning thresholds
- Consider disabling info-level alerts
- Increase time windows to reduce noise
Small Databases
- Lower disk usage thresholds
- Adjust connection counts for smaller
max_connectionsvalues - Consider different cache hit ratio expectations
Reducing Alert Fatigue
- Start Conservative: Begin with higher thresholds and longer time windows
- Monitor for Patterns: Look for alerts that fire frequently but don’t indicate real problems
- Tune Gradually: Adjust thresholds based on observed behavior
- Use Severity Appropriately: Reserve “critical” for true emergencies
- Group Related Alerts: Use Alertmanager grouping to avoid duplicate notifications
Alert Priority Matrix
| Severity | Response Time | Examples |
|---|---|---|
| Critical | Immediate (24/7) | Database down, connection exhaustion, transaction wraparound |
| Warning | Business hours | High connections, slow queries, bloat |
| Info | Review regularly | Configuration changes, unused indexes |
Monitoring Dashboard
Consider creating Grafana dashboards to visualize these metrics:
Key Dashboard Panels
- Availability Overview
- Database uptime
- Exporter health
- Connection status
- Connection Health
- Active connections (gauge)
- Connection pool usage (%)
- Idle in transaction count
- Query Performance
- Query rate (queries/sec)
- Average query duration
- Long-running transaction count
- Cache Performance
- Cache hit ratio (%)
- Disk read rate
- Temp file usage
- Replication Status
- Replication lag (seconds)
- WAL lag (bytes)
- Replica connection status
- Maintenance Health
- Last vacuum time per table
- Last analyze time per table
- Transaction ID age
- Dead tuple ratio
Troubleshooting Common Issues
Alert: PostgreSQLConnectionsExhausted
Symptoms: Applications cannot connect to database
Investigation Steps:
-- Check current connections
SELECT datname, count(*)
FROM pg_stat_activity
GROUP BY datname;
-- Find idle connections
SELECT pid, usename, application_name, state, state_change
FROM pg_stat_activity
WHERE state = 'idle'
ORDER BY state_change;
Solutions:
- Implement connection pooling (PgBouncer)
- Increase
max_connections(requires restart) - Kill idle connections:
SELECT pg_terminate_backend(pid) - Fix application connection leaks
Alert: PostgreSQLTableBloatCritical
Symptoms: Slow queries, high disk usage
Investigation Steps:
-- Check table bloat
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
Solutions:
- Run
VACUUMon affected table - For critical bloat:
VACUUM FULL(requires exclusive lock) - Tune autovacuum: decrease
autovacuum_vacuum_scale_factor - Check for long-running transactions blocking vacuum
Alert: PostgreSQLReplicationLagCritical
Symptoms: Replicas serving stale data
Investigation Steps:
-- On primary: check replication status
SELECT client_addr, state, sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) as pending_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) as replay_lag_bytes
FROM pg_stat_replication;
Solutions:
- Check replica CPU/disk/network resources
- Verify replica
max_wal_sendersandmax_replication_slots - Check for large transactions on primary
- Consider adding more replicas to distribute read load
Best Practices Summary
1. Alert Hygiene
- Review and tune alerts regularly
- Remove noisy alerts that don’t indicate actionable issues
- Use appropriate severity levels
- Document expected alert frequency
2. Response Procedures
- Create runbooks for critical alerts
- Define escalation paths
- Test response procedures regularly
- Train team members on alert investigation
3. Continuous Improvement
- Track alert trends over time
- Identify recurring issues
- Automate remediation where possible
- Review alert effectiveness quarterly
4. Integration
- Integrate with incident management (PagerDuty, Opsgenie)
- Send to team collaboration tools (Slack, Teams)
- Create tickets for non-urgent issues (Jira)
- Log all alerts for historical analysis
5. Documentation
- Maintain alert documentation
- Document threshold rationale
- Keep runbooks updated
- Share knowledge across team
Conclusion
Comprehensive PostgreSQL monitoring is essential for maintaining database reliability and performance. These 50+ alerts provide coverage across all critical database dimensions, from basic availability to advanced performance metrics.
Key takeaways:
- Start Simple: Implement availability and connection alerts first
- Tune Progressively: Adjust thresholds based on your environment
- Reduce Noise: Remove alerts that don’t indicate real problems
- Act on Alerts: Ensure every alert has a documented response procedure
- Monitor Trends: Use dashboards to visualize metric trends over time
By implementing this monitoring strategy, you’ll detect issues before they impact users, maintain optimal database performance, and gain deep visibility into your PostgreSQL infrastructure.
Remember: The goal isn’t to eliminate all alerts, but to ensure that every alert represents a real issue requiring attention. With proper tuning and maintenance, these alerts will become an invaluable tool for maintaining database health and reliability.
Additional Resources
- PostgreSQL Official Documentation
- postgres_exporter GitHub
- Prometheus Documentation
- PostgreSQL Wiki – Performance Optimization
- PostgreSQL Monitoring Queries
About This Guide
This guide covers monitoring setup for PostgreSQL 12+ using prometheus-community/postgres_exporter. Alert rules are production-tested and based on real-world database operations experience. Thresholds should be adjusted based on your specific workload and requirements.
I attached the example of alerts.yml.txt used on prometheus
Leave a Reply