Reducing AWS HPC Costs by 40%: How QyrosCloud Saved $720K Annually for a Leading Life Sciences Company
Customer Challenge
The company runs large-scale batch workloads to support drug discovery pipelines, including:
- molecular simulations
- protein interaction modeling
- high-throughput data processing
- GPU-accelerated computational workloads
As workloads scaled, cloud costs grew rapidly.
Key Challenges
- High Cost of On-Demand GPU Instances
- The organization relied heavily on on-demand GPU instances, which are among the most expensive compute resources in AWS. Without any commitment-based discounts or a defined cost optimization strategy, the company incurred significant overspending, particularly during periods of peak workload demand.
- Inefficient Job Scheduling and Resource Utilization
- Batch jobs were not optimized for either performance or cost efficiency. The environment suffered from suboptimal instance selection, underutilized compute resources, and inefficient job execution times. As a result, workloads took longer to complete, leading to increased compute consumption and higher overall AWS spend.
- Lack of FinOps Visibility
- The organization lacked centralized visibility into key cost drivers, including idle resources and rightsizing opportunities. Without this level of insight, it was difficult to proactively manage cloud spend or implement effective cost optimization strategies across the environment.
QyrosCloud Solution
QyrosCloud implemented a comprehensive FinOps optimization strategy focused on cost reduction, performance improvement, and operational efficiency.
1Hybrid Compute Strategy: Spot + Reserved Capacity
The team redesigned the compute strategy to replace on-demand instances with a combination of:
- Spot Instances for fault-tolerant batch workloads
- Reserved Instances / Savings Plans for baseline compute demand
This hybrid approach significantly reduced compute costs while maintaining reliability.
Key improvements
- prioritized Spot capacity for GPU workloads
- implemented fallback mechanisms to ensure job completion
- aligned Reserved capacity with predictable workloads
2HPC Optimization with AWS ParallelCluster
Using AWS ParallelCluster, QyrosCloud optimized the HPC environment to better support distributed workloads.
Enhancements included:
- optimized cluster scaling policies
- improved job scheduling efficiency
- automated provisioning of compute nodes
- better utilization of GPU resources
This ensured compute resources were allocated dynamically based on workload demand.
3Efficient Batch Processing with AWS Batch
QyrosCloud reconfigured AWS Batch environments to improve job orchestration.
Improvements included:
- optimized compute environments for cost efficiency
- improved job queue prioritization
- dynamic scaling of compute resources
- better alignment between job requirements and instance types
4Cluster Placement Groups for Performance Optimization
To reduce job runtime and improve efficiency, QyrosCloud implemented cluster placement groups.
Benefits included:
- low-latency communication between instances
- improved network throughput
- faster execution of distributed workloads
This optimization significantly reduced the time required to complete batch jobs, directly lowering compute costs.
5FinOps Visibility and Cost Optimization Framework
QyrosCloud introduced a FinOps framework to continuously optimize cloud spend.
Capabilities included:
- identification of idle and underutilized resources
- rightsizing of instance types
- cost allocation and tagging strategy
- continuous monitoring of AWS spend
This enabled the organization to maintain cost efficiency over time.
Results & Business Impact
The engagement delivered significant cost savings and performance improvements.
40% cost reduction across HPC workloads
Monthly AWS spend reduced from $150,000 to ~$90,000
Faster completion of distributed HPC jobs
Reduced batch job runtime through optimized networking and placement groups
Optimized Resource Utilization
Majority of workloads shifted to Spot Instances
Scalable and Cost-Efficient HPC Platform
Dynamic scaling of compute resources based on workload demand
FinOps-Driven Cloud Operations
Sustainable long-term cost management strategy
Technology Stack
AWS Services
3rd-Party Tool
About Therapeutics and Biotech
A leading therapeutics and biotechnology company specializing in protein modulation and drug discovery relied on large-scale high-performance computing (HPC) workloads to run batch simulations and data processing pipelines.