Optimize Linux I/O Performance with Storage Schedulers

Learn how to optimize Linux I/O performance through kernel parameter tuning, storage scheduler configuration, and filesystem optimizations. This tutorial covers scheduler selection, queue depth tuning, and performance monitoring for high-throughput applications.

Prerequisites

Root access to the Linux server
Basic understanding of Linux command line
Storage devices to optimize (SSD, NVMe, or HDD)

What this solves

Poor I/O performance can severely impact database servers, web applications, and data processing workloads. Linux I/O schedulers and kernel parameters aren't optimized for all workload types by default, leading to bottlenecks in high-throughput scenarios.

This tutorial shows you how to identify I/O bottlenecks, select appropriate schedulers for different storage types, tune kernel parameters, and optimize filesystem mount options for maximum throughput and minimum latency.

Step-by-step configuration

Install monitoring and benchmarking tools

Install essential tools for monitoring I/O performance and benchmarking storage devices.

sudo apt update
sudo apt install -y sysstat iotop fio hdparm nvme-cli

sudo dnf install -y sysstat iotop fio hdparm nvme-cli

Analyze current I/O performance

Check current I/O scheduler settings and baseline performance before making changes.

# Check current schedulers for all block devices
for dev in /sys/block/*/queue/scheduler; do
  echo "$dev: $(cat $dev)"
done

Show current I/O statistics
iostat -x 1 3

Check NVMe device information (if applicable)
sudo nvme list

Configure I/O schedulers for different storage types

Set optimal schedulers based on storage technology. NVMe SSDs benefit from none or mq-deadline, while HDDs work better with bfq or mq-deadline.

# Check storage type and set appropriate scheduler
lsblk -d -o name,rota

For NVMe/SSD (rota=0) - use none scheduler
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

For HDDs (rota=1) - use bfq scheduler
echo bfq | sudo tee /sys/block/sda/queue/scheduler

For high-performance SSDs - use mq-deadline
echo mq-deadline | sudo tee /sys/block/nvme0n1/queue/scheduler

Make scheduler changes persistent

Create udev rules to automatically apply scheduler settings on boot based on device type.

# Set scheduler for NVMe devices
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/scheduler}="none"

Set scheduler for SSDs
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"

Set scheduler for HDDs
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"

# Apply udev rules
sudo udevadm control --reload-rules
sudo udevadm trigger

Optimize I/O queue depths

Adjust queue depths to match storage capabilities and workload requirements. Higher queue depths improve throughput but may increase latency.

# Check current queue depth
cat /sys/block/nvme0n1/queue/nr_requests

Increase queue depth for high-throughput NVMe
echo 1024 | sudo tee /sys/block/nvme0n1/queue/nr_requests

Set read-ahead for sequential workloads (in KB)
echo 4096 | sudo tee /sys/block/nvme0n1/queue/read_ahead_kb

For databases, reduce read-ahead to minimize memory usage
echo 128 | sudo tee /sys/block/nvme0n1/queue/read_ahead_kb

Configure kernel I/O parameters

Optimize kernel parameters for better I/O performance, including dirty page handling and CPU scaling.

# Reduce dirty page writeback for consistent performance
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

Optimize for I/O intensive workloads
vm.swappiness = 1
vm.vfs_cache_pressure = 50

Increase maximum number of memory map areas
vm.max_map_count = 262144

TCP buffer tuning for network I/O
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

CPU scheduler for I/O bound processes
kernel.sched_migration_cost_ns = 5000000

# Apply kernel parameters
sudo sysctl -p /etc/sysctl.d/99-io-performance.conf

Note: The dirty page settings above optimize for consistent write performance. Lower values cause more frequent but smaller writes, reducing latency spikes.

Optimize filesystem mount options

Configure mount options for better I/O performance based on filesystem type and use case.

# High-performance ext4 options for databases
/dev/nvme0n1p1 /var/lib/mysql ext4 defaults,noatime,nobarrier,data=writeback 0 2

Balanced ext4 options for general use
/dev/nvme0n1p2 /opt/data ext4 defaults,noatime,commit=30 0 2

XFS options for large files and high throughput
/dev/nvme0n1p3 /var/lib/backups xfs defaults,noatime,logbsize=256k,largeio 0 2

# Test mount options before rebooting
sudo mount -o remount,noatime /var/lib/mysql

Verify mount options
mount | grep nvme

Warning: The nobarrier option improves performance but may risk data integrity during power failures. Only use with UPS protection or for non-critical data.

Configure per-process I/O scheduling

Set I/O priority classes for different applications to prevent I/O interference.

# Set real-time I/O priority for database
sudo ionice -c 1 -n 4 -p $(pgrep mysqld)

Set idle priority for backup processes
sudo ionice -c 3 -p $(pgrep backup)

Check I/O priorities
for pid in $(pgrep -f mysql); do
  echo "PID $pid: $(ionice -p $pid)"
done

Create I/O performance monitoring script

Set up continuous monitoring to track I/O performance improvements and identify bottlenecks.

#!/bin/bash

LOGFILE="/var/log/io-performance.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

echo "=== I/O Performance Report - $TIMESTAMP ===" >> $LOGFILE

Device statistics
echo "Device Statistics:" >> $LOGFILE
iostat -x 1 1 | grep -E '(Device|nvme|sd[a-z])' >> $LOGFILE

Top I/O processes
echo "Top I/O Processes:" >> $LOGFILE
iotop -a -o -d 1 -n 1 | head -20 >> $LOGFILE

Queue depths and schedulers
echo "Scheduler Configuration:" >> $LOGFILE
for dev in /sys/block/*/queue/scheduler; do
  device=$(echo $dev | cut -d'/' -f4)
  scheduler=$(cat $dev | grep -o '\[.*\]' | tr -d '[]')
  queue_depth=$(cat /sys/block/$device/queue/nr_requests)
  echo "$device: scheduler=$scheduler, queue_depth=$queue_depth" >> $LOGFILE
done

echo "" >> $LOGFILE

sudo chmod 755 /usr/local/bin/io-monitor.sh

Create systemd timer for regular monitoring
sudo tee /etc/systemd/system/io-monitor.timer > /dev/null << 'EOF'
[Unit]
Description=I/O Performance Monitor Timer

[Timer]
OnCalendar=*:0/10
Persistent=true

[Install]
WantedBy=timers.target
EOF

sudo tee /etc/systemd/system/io-monitor.service > /dev/null << 'EOF'
[Unit]
Description=I/O Performance Monitor

[Service]
Type=oneshot
ExecStart=/usr/local/bin/io-monitor.sh
EOF

Enable monitoring timer
sudo systemctl daemon-reload
sudo systemctl enable --now io-monitor.timer

Benchmark I/O improvements

Run comprehensive I/O benchmarks

Use fio to test different I/O patterns and measure performance improvements.

# Random read performance (database-like workload)
sudo fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting --filename=/dev/nvme0n1

Sequential write performance (backup/logging workload)
sudo fio --name=seqwrite --ioengine=libaio --iodepth=32 --rw=write --bs=64k --direct=1 --size=2G --numjobs=2 --runtime=60 --group_reporting --filename=/dev/nvme0n1

Mixed workload test
sudo fio --name=mixed --ioengine=libaio --iodepth=16 --rw=randrw --rwmixread=70 --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting --filename=/dev/nvme0n1

Warning: These fio tests write directly to the block device and will destroy data. Use a test partition or ensure you have backups.

Verify your setup

# Check active I/O schedulers
for dev in /sys/block/*/queue/scheduler; do
  echo "$dev: $(cat $dev)"
done

Verify kernel parameters
sudo sysctl -a | grep -E 'vm.dirty|vm.swappiness'

Check I/O statistics
iostat -x 1 3

Monitor top I/O processes
iotop -a -o -d 2

Check monitoring timer status
sudo systemctl status io-monitor.timer

Common issues

Symptom	Cause	Fix
High I/O wait times	Wrong scheduler for storage type	Switch to appropriate scheduler (none for NVMe, bfq for HDD)
Inconsistent write performance	Large dirty page ratio	Reduce vm.dirty_ratio to 10 or lower
Scheduler changes don't persist	Missing udev rules	Create /etc/udev/rules.d/60-io-schedulers.rules
Database slowdowns during backups	I/O priority conflicts	Set backup processes to idle priority with ionice -c 3
Low throughput on NVMe	Insufficient queue depth	Increase nr_requests to 1024 or higher
High memory usage	Excessive read-ahead buffering	Reduce read_ahead_kb for random access workloads

Next steps

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Global variables
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BACKUP_DIR="/opt/io-tuning-backup-$(date +%Y%m%d-%H%M%S)"
PKG_MGR=""
PKG_INSTALL=""
WORKLOAD_TYPE="balanced"

# Usage function
usage() {
    echo "Usage: $0 [OPTIONS]"
    echo "Options:"
    echo "  -w, --workload TYPE    Workload type: database|web|storage|balanced (default: balanced)"
    echo "  -h, --help            Show this help message"
    echo ""
    echo "Examples:"
    echo "  $0                    # Apply balanced I/O optimizations"
    echo "  $0 -w database        # Optimize for database workloads"
    echo "  $0 -w storage         # Optimize for high-throughput storage"
    exit 1
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -w|--workload)
            WORKLOAD_TYPE="$2"
            if [[ ! "$WORKLOAD_TYPE" =~ ^(database|web|storage|balanced)$ ]]; then
                echo -e "${RED}Error: Invalid workload type. Use: database, web, storage, or balanced${NC}" >&2
                exit 1
            fi
            shift 2
            ;;
        -h|--help)
            usage
            ;;
        *)
            echo -e "${RED}Error: Unknown option $1${NC}" >&2
            usage
            ;;
    esac
done

# Cleanup function for rollback
cleanup() {
    if [[ $? -ne 0 ]]; then
        echo -e "${RED}Error occurred. Check logs and consider restoring from backup: $BACKUP_DIR${NC}" >&2
    fi
}

trap cleanup ERR

# Check prerequisites
check_prerequisites() {
    echo -e "${YELLOW}[1/8] Checking prerequisites...${NC}"
    
    if [[ $EUID -ne 0 ]]; then
        echo -e "${RED}This script must be run as root${NC}" >&2
        exit 1
    fi

    if [ ! -f /etc/os-release ]; then
        echo -e "${RED}/etc/os-release not found. Cannot determine distribution.${NC}" >&2
        exit 1
    fi

    . /etc/os-release
    case "$ID" in
        ubuntu|debian)
            PKG_MGR="apt"
            PKG_INSTALL="apt install -y"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_INSTALL="yum install -y"
            ;;
        *)
            echo -e "${RED}Unsupported distribution: $ID${NC}" >&2
            exit 1
            ;;
    esac

    mkdir -p "$BACKUP_DIR"
    echo -e "${GREEN}Prerequisites check passed${NC}"
}

# Install monitoring tools
install_tools() {
    echo -e "${YELLOW}[2/8] Installing I/O monitoring and benchmarking tools...${NC}"
    
    if [[ "$PKG_MGR" == "apt" ]]; then
        apt update
    fi
    
    $PKG_INSTALL sysstat iotop fio hdparm util-linux
    
    # Install nvme-cli if available
    if $PKG_INSTALL nvme-cli 2>/dev/null; then
        echo -e "${GREEN}nvme-cli installed${NC}"
    else
        echo -e "${YELLOW}nvme-cli not available, skipping${NC}"
    fi
    
    echo -e "${GREEN}Tools installation completed${NC}"
}

# Backup current configuration
backup_config() {
    echo -e "${YELLOW}[3/8] Backing up current configuration...${NC}"
    
    # Backup current scheduler settings
    echo "# Current scheduler settings" > "$BACKUP_DIR/schedulers.txt"
    for dev in /sys/block/*/queue/scheduler 2>/dev/null; do
        if [[ -r "$dev" ]]; then
            echo "$dev: $(cat $dev)" >> "$BACKUP_DIR/schedulers.txt"
        fi
    done
    
    # Backup current sysctl settings
    sysctl -a > "$BACKUP_DIR/sysctl_original.conf" 2>/dev/null || true
    
    # Backup current udev rules
    if [[ -d /etc/udev/rules.d ]]; then
        cp -r /etc/udev/rules.d "$BACKUP_DIR/udev_rules_backup" 2>/dev/null || true
    fi
    
    echo -e "${GREEN}Configuration backed up to $BACKUP_DIR${NC}"
}

# Analyze current I/O performance
analyze_io() {
    echo -e "${YELLOW}[4/8] Analyzing current I/O performance...${NC}"
    
    echo "Current scheduler settings:"
    for dev in /sys/block/*/queue/scheduler 2>/dev/null; do
        if [[ -r "$dev" ]]; then
            echo "  $dev: $(cat $dev)"
        fi
    done
    
    echo -e "${GREEN}Current I/O analysis completed${NC}"
}

# Configure I/O schedulers
configure_schedulers() {
    echo -e "${YELLOW}[5/8] Configuring I/O schedulers...${NC}"
    
    # Create udev rules for persistent scheduler settings
    cat > /etc/udev/rules.d/60-io-schedulers.rules << 'EOF'
# I/O Scheduler optimization rules
# NVMe devices - use none scheduler for best performance
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/scheduler}="none"

# SSDs - use mq-deadline
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"

# HDDs - use bfq for better responsiveness
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
EOF

    # Apply schedulers immediately
    for dev in /sys/block/*/queue/scheduler 2>/dev/null; do
        if [[ -w "$dev" ]]; then
            device=$(echo "$dev" | cut -d'/' -f4)
            if [[ "$device" =~ ^nvme ]]; then
                if grep -q "none" "$dev"; then
                    echo "none" > "$dev" 2>/dev/null || true
                fi
            elif [[ "$device" =~ ^sd ]]; then
                rotational_file="/sys/block/$device/queue/rotational"
                if [[ -r "$rotational_file" ]]; then
                    if [[ "$(cat $rotational_file)" == "0" ]]; then
                        # SSD
                        if grep -q "mq-deadline" "$dev"; then
                            echo "mq-deadline" > "$dev" 2>/dev/null || true
                        fi
                    else
                        # HDD
                        if grep -q "bfq" "$dev"; then
                            echo "bfq" > "$dev" 2>/dev/null || true
                        fi
                    fi
                fi
            fi
        fi
    done

    # Reload udev rules
    udevadm control --reload-rules
    udevadm trigger
    
    echo -e "${GREEN}I/O schedulers configured${NC}"
}

# Configure kernel parameters
configure_kernel() {
    echo -e "${YELLOW}[6/8] Configuring kernel I/O parameters...${NC}"
    
    # Set parameters based on workload type
    case "$WORKLOAD_TYPE" in
        database)
            DIRTY_BG_RATIO=5
            DIRTY_RATIO=10
            SWAPPINESS=1
            READ_AHEAD=128
            ;;
        storage)
            DIRTY_BG_RATIO=15
            DIRTY_RATIO=30
            SWAPPINESS=10
            READ_AHEAD=4096
            ;;
        web)
            DIRTY_BG_RATIO=10
            DIRTY_RATIO=20
            SWAPPINESS=5
            READ_AHEAD=512
            ;;
        *)
            DIRTY_BG_RATIO=5
            DIRTY_RATIO=15
            SWAPPINESS=1
            READ_AHEAD=1024
            ;;
    esac

    cat > /etc/sysctl.d/99-io-performance.conf << EOF
# I/O Performance optimizations for $WORKLOAD_TYPE workload
vm.dirty_background_ratio = $DIRTY_BG_RATIO
vm.dirty_ratio = $DIRTY_RATIO
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200
vm.swappiness = $SWAPPINESS
vm.vfs_cache_pressure = 50
vm.max_map_count = 262144

# Network I/O optimization
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

# CPU scheduler optimization
kernel.sched_migration_cost_ns = 5000000
EOF

    # Apply immediately
    sysctl -p /etc/sysctl.d/99-io-performance.conf
    
    echo -e "${GREEN}Kernel parameters configured${NC}"
}

# Optimize queue depths and read-ahead
optimize_queues() {
    echo -e "${YELLOW}[7/8] Optimizing I/O queue depths and read-ahead...${NC}"
    
    for dev in /sys/block/*/queue/nr_requests 2>/dev/null; do
        if [[ -w "$dev" ]]; then
            device=$(echo "$dev" | cut -d'/' -f4)
            
            # Set queue depth based on device type
            if [[ "$device" =~ ^nvme ]]; then
                echo "1024" > "$dev" 2>/dev/null || true
            else
                echo "512" > "$dev" 2>/dev/null || true
            fi
        fi
    done
    
    # Set read-ahead based on workload
    for dev in /sys/block/*/queue/read_ahead_kb 2>/dev/null; do
        if [[ -w "$dev" ]]; then
            echo "$READ_AHEAD" > "$dev" 2>/dev/null || true
        fi
    done
    
    echo -e "${GREEN}Queue optimization completed${NC}"
}

# Verification
verify_setup() {
    echo -e "${YELLOW}[8/8] Verifying configuration...${NC}"
    
    echo "Current I/O scheduler settings:"
    for dev in /sys/block/*/queue/scheduler 2>/dev/null; do
        if [[ -r "$dev" ]]; then
            device=$(echo "$dev" | cut -d'/' -f4)
            scheduler=$(cat "$dev" | grep -o '\[.*\]' | tr -d '[]')
            echo "  $device: $scheduler"
        fi
    done
    
    echo ""
    echo "Applied kernel parameters:"
    sysctl vm.dirty_background_ratio vm.dirty_ratio vm.swappiness 2>/dev/null || true
    
    echo ""
    echo -e "${GREEN}I/O optimization completed successfully!${NC}"
    echo -e "${GREEN}Workload type: $WORKLOAD_TYPE${NC}"
    echo -e "${GREEN}Backup location: $BACKUP_DIR${NC}"
    echo ""
    echo "To monitor I/O performance, use:"
    echo "  iostat -x 1"
    echo "  iotop"
    echo "  fio (for benchmarking)"
}

# Main execution
main() {
    echo -e "${GREEN}Starting Linux I/O Performance Optimization${NC}"
    echo -e "${GREEN}Workload type: $WORKLOAD_TYPE${NC}"
    echo ""
    
    check_prerequisites
    install_tools
    backup_config
    analyze_io
    configure_schedulers
    configure_kernel
    optimize_queues
    verify_setup
}

main "$@"

Review the script before running. Execute with: bash install.sh

#linux io performance #storage scheduler #nvme tuning #kernel parameters #iostat #fio benchmarking

Optimize Linux I/O performance with kernel tuning and storage schedulers for high-throughput workloads

Prerequisites

What this solves

Step-by-step configuration

Install monitoring and benchmarking tools

Analyze current I/O performance

Show current I/O statistics

Check NVMe device information (if applicable)

Configure I/O schedulers for different storage types

For NVMe/SSD (rota=0) - use none scheduler

For HDDs (rota=1) - use bfq scheduler

For high-performance SSDs - use mq-deadline

Make scheduler changes persistent

Set scheduler for SSDs

Set scheduler for HDDs

Optimize I/O queue depths

Increase queue depth for high-throughput NVMe

Set read-ahead for sequential workloads (in KB)

For databases, reduce read-ahead to minimize memory usage

Configure kernel I/O parameters

Optimize for I/O intensive workloads

Increase maximum number of memory map areas

TCP buffer tuning for network I/O

CPU scheduler for I/O bound processes

Optimize filesystem mount options

Balanced ext4 options for general use

XFS options for large files and high throughput

Verify mount options

Configure per-process I/O scheduling

Set idle priority for backup processes

Check I/O priorities

Create I/O performance monitoring script

Device statistics

Top I/O processes

Queue depths and schedulers

Create systemd timer for regular monitoring

Enable monitoring timer

Benchmark I/O improvements

Run comprehensive I/O benchmarks

Sequential write performance (backup/logging workload)

Mixed workload test

Verify your setup

Verify kernel parameters

Check I/O statistics

Monitor top I/O processes

Check monitoring timer status

Common issues

Next steps

Related tutorials

Configure Cherokee caching and compression for improved performance

Implement Spark SQL performance optimization with Catalyst optimizer and advanced tuning

Configure Nginx Redis cluster caching for high availability and performance optimization

Don't want to manage this yourself?