What you will achieve
You'll learn to identify the specific metrics and patterns that signal database problems before they impact users. More importantly, you'll understand how to execute a zero downtime migration when those signals indicate it's time to move to better infrastructure.
Early detection prevents emergency migrations under pressure, which often lead to outages and data inconsistencies.
Prerequisites and assumptions
You need:
- Database monitoring tools (built-in metrics or dedicated monitoring)
- Administrative access to your database server
- Basic understanding of your application's database usage patterns
- Ability to analyze query performance and connection metrics
This guide covers MySQL, PostgreSQL, and general principles that apply to most relational databases.
Step-by-step monitoring implementation
Database problems develop over months, not minutes. The key is measuring the right indicators consistently.
1. Set up connection pool monitoring
Connection exhaustion kills applications faster than slow queries. Monitor active connections against your maximum:
-- MySQL: Check current connections SHOW STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections'; -- PostgreSQL: Check active connections SELECT count(*) FROM pg_stat_activity WHERE state = 'active'; SHOW max_connections;
Set alerts when active connections exceed 70% of your maximum. At 80%, you're in danger territory.
2. Track query execution time trends
Individual slow queries matter less than trending degradation across all queries:
-- MySQL: Enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1.0; SET GLOBAL log_queries_not_using_indexes = 'ON'; -- PostgreSQL: Track query statistics SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;
Graph average query execution time weekly. A steady upward trend indicates growing data volume or degrading indexes.
3. Monitor lock wait times
Lock contention creates cascading slowdowns that affect the entire application:
-- MySQL: Check lock waits SELECT * FROM performance_schema.events_waits_summary_global_by_event_name WHERE event_name LIKE '%lock%' AND count_star > 0; -- PostgreSQL: Monitor lock activity SELECT mode, locktype, database, relation, granted, COUNT(*) FROM pg_locks GROUP BY mode, locktype, database, relation, granted;
Lock waits above 100ms regularly indicate table design problems or transaction scope issues.
4. Measure disk I/O patterns
Database performance ultimately depends on storage performance:
# Linux: Monitor disk utilization iostat -x 1 # Look for: # %util above 80% consistently # avgqu-sz (queue size) above 2 # await times above 20ms
High I/O wait combined with slow queries means your storage can't handle the workload.
5. Track memory buffer efficiency
Poor buffer hit ratios force expensive disk reads:
-- MySQL: Buffer pool hit ratio SELECT (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_read_requests') / (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_reads') * 100 AS hit_ratio; -- PostgreSQL: Buffer hit ratio SELECT sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) * 100 as buffer_hit_ratio FROM pg_statio_user_tables;
Hit ratios below 95% indicate insufficient memory allocation or growing dataset size.
Planning your zero downtime migration
When monitoring reveals consistent problems, plan your migration before you're forced into emergency mode.
1. Choose your migration strategy
The right approach depends on your database size and acceptable complexity:
Blue-green deployment: Works for smaller databases (under 100GB) where you can maintain two complete environments:
# Set up read replica mysql> CHANGE MASTER TO MASTER_HOST='source-db.example.com'; mysql> START SLAVE; # Verify replication lag mysql> SHOW SLAVE STATUS\G
Rolling migration: Better for larger databases using logical replication:
# PostgreSQL logical replication setup -- On source database CREATE PUBLICATION migration_pub FOR ALL TABLES; -- On target database CREATE SUBSCRIPTION migration_sub CONNECTION 'host=source-db.example.com user=replicator dbname=production' PUBLICATION migration_pub;
2. Establish data consistency verification
Never migrate without verification. Set up checksums for critical tables:
-- Create verification queries
SELECT
table_name,
COUNT(*) as row_count,
COALESCE(SUM(CRC32(CONCAT_WS('|', col1, col2, col3))), 0) as checksum
FROM your_table
GROUP BY table_name;Run these queries on both source and target during migration windows.
3. Plan your switchover sequence
The actual switchover requires precise timing:
- Stop write traffic to the source database
- Wait for replication lag to reach zero
- Verify data consistency with checksums
- Update application database configuration
- Redirect traffic to new database
- Monitor for application errors
Document exact commands and timing for each step.
Verification: confirming your migration works
Successful zero downtime migration requires multiple verification layers.
Application-level verification
Monitor your application's key metrics during and after migration:
# Check response times
curl -w "Total time: %{time_total}s\n" -o /dev/null -s https://your-app.com/health
# Monitor error rates
grep "ERROR" /var/log/application.log | wc -lResponse times should remain stable. Any spike indicates connection pool issues or configuration problems.
Database performance verification
Compare key metrics before and after migration:
-- Query performance comparison SELECT query_digest, avg_timer_wait/1000000 as avg_time_ms, count_star as executions FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10;
Performance should improve or remain equivalent. Degradation suggests configuration issues.
Connection and load verification
Verify your new database handles the production load:
# Monitor active connections watch "mysql -e 'SHOW STATUS LIKE \"Threads_connected\";'" # Check CPU and memory usage top -p $(pgrep mysqld)
Resource usage patterns should match your capacity planning.
Common pitfalls to avoid
Skipping replication lag monitoring: Always verify replication is current before switching over. Even seconds of lag can cause data inconsistencies.
Insufficient connection pool configuration: Database performance issues often stem from connection pool mismatches between old and new environments.
Missing application-specific indexes: Your new database might have different query execution plans. Verify all expected indexes exist and are being used.
Inadequate rollback planning: Always maintain the ability to switch back to your original database if problems arise.
Next steps and related reading
After successful migration, focus on preventing future database problems:
- Implement automated monitoring for the warning signals covered here
- Set up regular performance baselines to detect gradual degradation
- Plan capacity upgrades based on growth trends, not crisis response
- Document your migration process for future infrastructure changes
For comprehensive infrastructure reliability practices, review our guide on production incident management.
Understanding when your entire infrastructure setup needs attention helps prevent database problems from recurring in new environments.
Long-term database health
Database warning signals indicate broader infrastructure patterns. Once you've successfully migrated, the monitoring practices established here prevent future emergency situations.
Regular measurement of connection patterns, query performance, and resource utilization keeps your database performing well as your application grows. Early detection always costs less than emergency response.
Need this running in production without building it yourself? See our managed infrastructure services or schedule a call.