Infrastructure

How a SaaS platform cut infrastructure costs by 40% while improving response times

Binadit Tech Team · Apr 27, 2026 · 9 min पढ़ें
How a SaaS platform cut infrastructure costs by 40% while improving response times

The situation: growing SaaS platform hitting infrastructure limits

A European customer relationship management platform had reached 50,000 active users and was processing 2.3 million API requests daily. Their infrastructure costs had climbed to €18,000 monthly, but performance was declining. Database queries were taking 450ms on average, and their p95 response time had degraded to 2.8 seconds during peak hours.

The platform ran on a traditional setup: three oversized virtual machines, a monolithic database server, and basic load balancing. Traffic patterns showed clear peaks during European business hours, but their infrastructure ran at full capacity 24/7.

The engineering team had tried vertical scaling twice in six months, each time hoping more CPU and RAM would solve the performance issues. Instead, costs increased while response times remained problematic. Customer complaints about slow loading times were becoming frequent.

What made this situation typical was the underlying assumption that performance and cost optimization were competing goals. The team believed faster infrastructure meant more expensive infrastructure.

What we found during the infrastructure audit

The technical audit revealed several performance bottlenecks that were also driving unnecessary costs.

Database analysis showed 73% of queries were N+1 patterns hitting the same tables repeatedly. A single user dashboard load triggered an average of 47 individual database queries. The database server, provisioned with 32 CPU cores and 128GB RAM, was actually CPU-bound due to inefficient query patterns, not data volume.

Application servers were running at 12% average CPU utilization but consuming full allocated resources. The load balancer was distributing traffic evenly across three identical instances, regardless of actual load patterns or time of day.

Cache hit rates were at 23%, well below the 80%+ we typically see in well-optimized systems. Redis was configured with default settings and treated as simple key-value storage, missing opportunities for query result caching and session management optimization.

Network analysis revealed the application was making external API calls synchronously during user requests, adding 200-800ms latency to operations that should have been asynchronous background jobs.

Most significantly, traffic monitoring showed a predictable pattern: 78% of daily load occurred during 9 AM to 6 PM Central European Time, yet infrastructure remained fully provisioned around the clock.

The approach we took and why

Rather than adding more resources, we focused on three areas where performance improvements would simultaneously reduce costs: query optimization, intelligent caching, and load-based scaling.

For database optimization, we addressed the N+1 query patterns through eager loading and query consolidation. This approach would reduce both database server load and response times, potentially allowing downsizing of database resources.

Cache strategy focused on implementing result caching for expensive queries and API responses. By caching database query results and external API responses, we could reduce both database load and external service costs while dramatically improving response times.

The scaling approach involved implementing horizontal scaling with automatic provisioning based on actual demand. Instead of running three full-size instances continuously, we designed a system that could scale from one instance during off-hours to four instances during peak load.

We also moved long-running tasks to asynchronous background processing, removing external API calls from the user request path and enabling better resource utilization.

This approach prioritized changes that would improve both performance and cost efficiency, rather than treating them as trade-offs.

Implementation details with specific configurations

Database optimization started with query analysis and consolidation. We replaced N+1 patterns with batch queries and implemented eager loading for related data. A typical user dashboard query went from 47 individual queries to 3 optimized queries using JOIN operations and strategic indexing.

For the Redis caching layer, we implemented a multi-tier strategy:

Application-level query result caching with 15-minute TTL for user-specific data and 1-hour TTL for shared reference data. Database query results were cached using query hash keys, reducing identical query execution from dozens per minute to near-zero during typical usage.

API response caching for external service calls with 30-minute TTL, completely removing these calls from the user request path. External API responses were cached and refreshed asynchronously, eliminating the 200-800ms latency from user-facing operations.

Session and user preference caching reduced database lookups for authentication and personalization data.

The horizontal scaling implementation used cloud-native auto-scaling policies:

Minimum instances: 1
Maximum instances: 4
Scale up: CPU > 70% for 5 minutes
Scale down: CPU < 30% for 15 minutes
Target group health checks: 30-second intervals

Instance sizing changed from three c5.2xlarge instances (8 vCPU, 16GB RAM each) running continuously to a dynamic fleet of c5.large instances (2 vCPU, 4GB RAM each) that scaled based on demand.

Background job processing moved to a separate queue system using dedicated worker instances that could scale independently of web traffic. External API calls, report generation, and email sending moved to this asynchronous system.

Database server configuration was optimized for the new query patterns, with connection pooling and query plan caching enabled. The oversized database instance was downsized from 32 cores to 8 cores after query optimization reduced CPU usage by 67%.

Results with real numbers from before and after

Performance improvements were substantial and immediate. Average database query time dropped from 450ms to 89ms, a 80% reduction. API response times improved from 2.8 seconds p95 to 1.1 seconds p95, a 61% improvement in worst-case performance.

Cache hit rates increased from 23% to 87%, meaning most requests were served from fast cache storage rather than hitting the database. Time to first byte (TTFB) improved from 890ms average to 312ms average.

User-facing metrics showed the impact: page load times decreased from 4.2 seconds average to 2.1 seconds average, and customer complaints about slow performance dropped to near zero.

Cost reductions exceeded expectations. Monthly infrastructure spending decreased from €18,000 to €11,000, a 39% reduction. The breakdown showed where savings came from:

Application server costs dropped 64% through rightsizing and auto-scaling. Database server costs decreased 47% through optimization and downsizing. External API costs reduced 31% through caching and request reduction.

Operational efficiency improved as well. System uptime remained at 99.97%, the same as before optimization. However, the auto-scaling system handled traffic spikes automatically, reducing manual intervention and on-call incidents by roughly 40%.

Resource utilization became much more efficient. Instead of running at 12% average CPU utilization on oversized instances, the new setup maintained 45-65% utilization on appropriately sized resources that scaled with demand.

The platform successfully handled its highest-ever traffic day (3.7 million API requests) three months after optimization, with better response times than the previous system handled on normal days.

What we'd do differently next time

Database migration could have been more gradual. We implemented all query optimizations simultaneously, which made it difficult to measure the impact of individual changes. A more phased approach would provide better insight into which optimizations delivered the most value.

Cache warming strategies needed more attention. During the first week after implementation, cache miss rates were higher than expected during traffic spikes, causing temporary performance degradation. Pre-warming critical cache entries during deployment would have prevented this.

Monitoring and alerting required more upfront configuration. The new distributed architecture had different failure modes than the monolithic setup, and our existing monitoring didn't catch some issues until they affected users. More comprehensive monitoring of cache hit rates, queue depths, and auto-scaling events should have been configured before go-live.

Load testing should have been more extensive. We tested the auto-scaling behavior, but didn't fully simulate the combination of high traffic and cache misses that occurred during the first major traffic spike post-migration.

Documentation of the new architecture took longer than expected to complete. The team needed clearer runbooks for the more complex system, especially for troubleshooting scenarios that were different from the previous monolithic setup.

Communication about the changes could have been better coordinated with the customer success team. They weren't prepared for the dramatic improvement in performance, which led to some initial confusion when customers mentioned faster loading times.

Key lessons for sustainable cost optimization

The most important insight was that performance optimization and cost reduction often align when you focus on efficiency rather than raw capacity. Identifying and eliminating waste in queries, caching, and resource allocation delivered both better performance and lower costs.

Query optimization provided the highest return on investment. The 80% reduction in database query time had cascading effects throughout the system, reducing CPU usage, memory pressure, and connection pool exhaustion. This single change enabled downsizing of multiple infrastructure components.

Intelligent caching proved more valuable than anticipated. Beyond the obvious performance benefits, caching reduced load on expensive resources like database servers and external APIs, creating cost savings in multiple areas.

Auto-scaling based on actual demand rather than peak capacity assumptions transformed the cost structure. Most applications have predictable traffic patterns, and rightsizing for average load with burst capability costs significantly less than provisioning for peak load continuously.

The experience reinforced that database performance optimization should be the first step in any cost reduction effort, since database bottlenecks often force oversizing of other components.

Monitoring becomes more critical with a distributed, auto-scaling architecture. The cost savings and performance improvements are sustainable only with proper visibility into system behavior and automated responses to changing conditions.

Working with managed infrastructure services that understand both performance optimization and cost management makes these types of comprehensive improvements feasible for teams focused on product development rather than infrastructure management.

Making cost optimization sustainable

Six months after implementation, the SaaS platform continues operating at the optimized cost and performance levels. Monthly infrastructure costs have remained stable at €11,000 despite 23% user growth, demonstrating that the efficiency improvements scale with business growth.

The auto-scaling system has handled seasonal traffic variations and product launches without manual intervention. Peak traffic events that previously required infrastructure planning and manual scaling now happen transparently.

Most importantly, the engineering team can focus on product features rather than infrastructure scaling decisions. The system's ability to handle growth automatically while maintaining cost efficiency has eliminated infrastructure concerns as a constraint on business development.

These results show that cost optimization doesn't require sacrificing performance when approached systematically through efficiency improvements rather than resource reduction.

Facing a similar challenge? Tell us about your setup and we will outline an approach.