Optimize Linux I/O performance with kernel tuning and storage schedulers for high-throughput workloads

Intermediate 25 min Apr 03, 2026 133 views
Ubuntu 24.04 Ubuntu 22.04 Debian 12 AlmaLinux 9 Rocky Linux 9 Fedora 41

Learn how to optimize Linux I/O performance through kernel parameter tuning, storage scheduler configuration, and filesystem optimizations. This tutorial covers scheduler selection, queue depth tuning, and performance monitoring for high-throughput applications.

Prerequisites

  • Root access to the Linux server
  • Basic understanding of Linux command line
  • Storage devices to optimize (SSD, NVMe, or HDD)

What this solves

Poor I/O performance can severely impact database servers, web applications, and data processing workloads. Linux I/O schedulers and kernel parameters aren't optimized for all workload types by default, leading to bottlenecks in high-throughput scenarios.

This tutorial shows you how to identify I/O bottlenecks, select appropriate schedulers for different storage types, tune kernel parameters, and optimize filesystem mount options for maximum throughput and minimum latency.

Step-by-step configuration

Install monitoring and benchmarking tools

Install essential tools for monitoring I/O performance and benchmarking storage devices.

sudo apt update
sudo apt install -y sysstat iotop fio hdparm nvme-cli
sudo dnf install -y sysstat iotop fio hdparm nvme-cli

Analyze current I/O performance

Check current I/O scheduler settings and baseline performance before making changes.

# Check current schedulers for all block devices
for dev in /sys/block/*/queue/scheduler; do
  echo "$dev: $(cat $dev)"
done

Show current I/O statistics

iostat -x 1 3

Check NVMe device information (if applicable)

sudo nvme list

Configure I/O schedulers for different storage types

Set optimal schedulers based on storage technology. NVMe SSDs benefit from none or mq-deadline, while HDDs work better with bfq or mq-deadline.

# Check storage type and set appropriate scheduler
lsblk -d -o name,rota

For NVMe/SSD (rota=0) - use none scheduler

echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

For HDDs (rota=1) - use bfq scheduler

echo bfq | sudo tee /sys/block/sda/queue/scheduler

For high-performance SSDs - use mq-deadline

echo mq-deadline | sudo tee /sys/block/nvme0n1/queue/scheduler

Make scheduler changes persistent

Create udev rules to automatically apply scheduler settings on boot based on device type.

# Set scheduler for NVMe devices
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/scheduler}="none"

Set scheduler for SSDs

ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"

Set scheduler for HDDs

ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
# Apply udev rules
sudo udevadm control --reload-rules
sudo udevadm trigger

Optimize I/O queue depths

Adjust queue depths to match storage capabilities and workload requirements. Higher queue depths improve throughput but may increase latency.

# Check current queue depth
cat /sys/block/nvme0n1/queue/nr_requests

Increase queue depth for high-throughput NVMe

echo 1024 | sudo tee /sys/block/nvme0n1/queue/nr_requests

Set read-ahead for sequential workloads (in KB)

echo 4096 | sudo tee /sys/block/nvme0n1/queue/read_ahead_kb

For databases, reduce read-ahead to minimize memory usage

echo 128 | sudo tee /sys/block/nvme0n1/queue/read_ahead_kb

Configure kernel I/O parameters

Optimize kernel parameters for better I/O performance, including dirty page handling and CPU scaling.

# Reduce dirty page writeback for consistent performance
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

Optimize for I/O intensive workloads

vm.swappiness = 1 vm.vfs_cache_pressure = 50

Increase maximum number of memory map areas

vm.max_map_count = 262144

TCP buffer tuning for network I/O

net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728

CPU scheduler for I/O bound processes

kernel.sched_migration_cost_ns = 5000000
# Apply kernel parameters
sudo sysctl -p /etc/sysctl.d/99-io-performance.conf
Note: The dirty page settings above optimize for consistent write performance. Lower values cause more frequent but smaller writes, reducing latency spikes.

Optimize filesystem mount options

Configure mount options for better I/O performance based on filesystem type and use case.

# High-performance ext4 options for databases
/dev/nvme0n1p1 /var/lib/mysql ext4 defaults,noatime,nobarrier,data=writeback 0 2

Balanced ext4 options for general use

/dev/nvme0n1p2 /opt/data ext4 defaults,noatime,commit=30 0 2

XFS options for large files and high throughput

/dev/nvme0n1p3 /var/lib/backups xfs defaults,noatime,logbsize=256k,largeio 0 2
# Test mount options before rebooting
sudo mount -o remount,noatime /var/lib/mysql

Verify mount options

mount | grep nvme
Warning: The nobarrier option improves performance but may risk data integrity during power failures. Only use with UPS protection or for non-critical data.

Configure per-process I/O scheduling

Set I/O priority classes for different applications to prevent I/O interference.

# Set real-time I/O priority for database
sudo ionice -c 1 -n 4 -p $(pgrep mysqld)

Set idle priority for backup processes

sudo ionice -c 3 -p $(pgrep backup)

Check I/O priorities

for pid in $(pgrep -f mysql); do echo "PID $pid: $(ionice -p $pid)" done

Create I/O performance monitoring script

Set up continuous monitoring to track I/O performance improvements and identify bottlenecks.

#!/bin/bash

LOGFILE="/var/log/io-performance.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

echo "=== I/O Performance Report - $TIMESTAMP ===" >> $LOGFILE

Device statistics

echo "Device Statistics:" >> $LOGFILE iostat -x 1 1 | grep -E '(Device|nvme|sd[a-z])' >> $LOGFILE

Top I/O processes

echo "Top I/O Processes:" >> $LOGFILE iotop -a -o -d 1 -n 1 | head -20 >> $LOGFILE

Queue depths and schedulers

echo "Scheduler Configuration:" >> $LOGFILE for dev in /sys/block/*/queue/scheduler; do device=$(echo $dev | cut -d'/' -f4) scheduler=$(cat $dev | grep -o '\[.*\]' | tr -d '[]') queue_depth=$(cat /sys/block/$device/queue/nr_requests) echo "$device: scheduler=$scheduler, queue_depth=$queue_depth" >> $LOGFILE done echo "" >> $LOGFILE
sudo chmod 755 /usr/local/bin/io-monitor.sh

Create systemd timer for regular monitoring

sudo tee /etc/systemd/system/io-monitor.timer > /dev/null << 'EOF' [Unit] Description=I/O Performance Monitor Timer [Timer] OnCalendar=*:0/10 Persistent=true [Install] WantedBy=timers.target EOF sudo tee /etc/systemd/system/io-monitor.service > /dev/null << 'EOF' [Unit] Description=I/O Performance Monitor [Service] Type=oneshot ExecStart=/usr/local/bin/io-monitor.sh EOF

Enable monitoring timer

sudo systemctl daemon-reload sudo systemctl enable --now io-monitor.timer

Benchmark I/O improvements

Run comprehensive I/O benchmarks

Use fio to test different I/O patterns and measure performance improvements.

# Random read performance (database-like workload)
sudo fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting --filename=/dev/nvme0n1

Sequential write performance (backup/logging workload)

sudo fio --name=seqwrite --ioengine=libaio --iodepth=32 --rw=write --bs=64k --direct=1 --size=2G --numjobs=2 --runtime=60 --group_reporting --filename=/dev/nvme0n1

Mixed workload test

sudo fio --name=mixed --ioengine=libaio --iodepth=16 --rw=randrw --rwmixread=70 --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting --filename=/dev/nvme0n1
Warning: These fio tests write directly to the block device and will destroy data. Use a test partition or ensure you have backups.

Verify your setup

# Check active I/O schedulers
for dev in /sys/block/*/queue/scheduler; do
  echo "$dev: $(cat $dev)"
done

Verify kernel parameters

sudo sysctl -a | grep -E 'vm.dirty|vm.swappiness'

Check I/O statistics

iostat -x 1 3

Monitor top I/O processes

iotop -a -o -d 2

Check monitoring timer status

sudo systemctl status io-monitor.timer

Common issues

SymptomCauseFix
High I/O wait timesWrong scheduler for storage typeSwitch to appropriate scheduler (none for NVMe, bfq for HDD)
Inconsistent write performanceLarge dirty page ratioReduce vm.dirty_ratio to 10 or lower
Scheduler changes don't persistMissing udev rulesCreate /etc/udev/rules.d/60-io-schedulers.rules
Database slowdowns during backupsI/O priority conflictsSet backup processes to idle priority with ionice -c 3
Low throughput on NVMeInsufficient queue depthIncrease nr_requests to 1024 or higher
High memory usageExcessive read-ahead bufferingReduce read_ahead_kb for random access workloads

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle infrastructure performance optimization for businesses that depend on uptime. From initial setup to ongoing operations.