Life SciencesTherapeutics and Biotech

Reducing AWS HPC Costs by 40%: How QyrosCloud Saved $720K Annually for a Leading Life Sciences Company

Customer Challenge

The company runs large-scale batch workloads to support drug discovery pipelines, including:

  • molecular simulations
  • protein interaction modeling
  • high-throughput data processing
  • GPU-accelerated computational workloads

As workloads scaled, cloud costs grew rapidly.

Key Challenges

High Cost of On-Demand GPU Instances
The organization relied heavily on on-demand GPU instances, which are among the most expensive compute resources in AWS. Without any commitment-based discounts or a defined cost optimization strategy, the company incurred significant overspending, particularly during periods of peak workload demand.
Inefficient Job Scheduling and Resource Utilization
Batch jobs were not optimized for either performance or cost efficiency. The environment suffered from suboptimal instance selection, underutilized compute resources, and inefficient job execution times. As a result, workloads took longer to complete, leading to increased compute consumption and higher overall AWS spend.
Lack of FinOps Visibility
The organization lacked centralized visibility into key cost drivers, including idle resources and rightsizing opportunities. Without this level of insight, it was difficult to proactively manage cloud spend or implement effective cost optimization strategies across the environment.

QyrosCloud Solution

QyrosCloud implemented a comprehensive FinOps optimization strategy focused on cost reduction, performance improvement, and operational efficiency.

1Hybrid Compute Strategy: Spot + Reserved Capacity

The team redesigned the compute strategy to replace on-demand instances with a combination of:

  • Spot Instances for fault-tolerant batch workloads
  • Reserved Instances / Savings Plans for baseline compute demand

This hybrid approach significantly reduced compute costs while maintaining reliability.

Key improvements

  • prioritized Spot capacity for GPU workloads
  • implemented fallback mechanisms to ensure job completion
  • aligned Reserved capacity with predictable workloads

2HPC Optimization with AWS ParallelCluster

Using AWS ParallelCluster, QyrosCloud optimized the HPC environment to better support distributed workloads.

Enhancements included:

  • optimized cluster scaling policies
  • improved job scheduling efficiency
  • automated provisioning of compute nodes
  • better utilization of GPU resources

This ensured compute resources were allocated dynamically based on workload demand.

3Efficient Batch Processing with AWS Batch

QyrosCloud reconfigured AWS Batch environments to improve job orchestration.

Improvements included:

  • optimized compute environments for cost efficiency
  • improved job queue prioritization
  • dynamic scaling of compute resources
  • better alignment between job requirements and instance types

4Cluster Placement Groups for Performance Optimization

To reduce job runtime and improve efficiency, QyrosCloud implemented cluster placement groups.

Benefits included:

  • low-latency communication between instances
  • improved network throughput
  • faster execution of distributed workloads

This optimization significantly reduced the time required to complete batch jobs, directly lowering compute costs.

5FinOps Visibility and Cost Optimization Framework

QyrosCloud introduced a FinOps framework to continuously optimize cloud spend.

Capabilities included:

  • identification of idle and underutilized resources
  • rightsizing of instance types
  • cost allocation and tagging strategy
  • continuous monitoring of AWS spend

This enabled the organization to maintain cost efficiency over time.

Results & Business Impact

The engagement delivered significant cost savings and performance improvements.

40% cost reduction across HPC workloads

Monthly AWS spend reduced from $150,000 to ~$90,000

Faster completion of distributed HPC jobs

Reduced batch job runtime through optimized networking and placement groups

Optimized Resource Utilization

Majority of workloads shifted to Spot Instances

Scalable and Cost-Efficient HPC Platform

Dynamic scaling of compute resources based on workload demand

FinOps-Driven Cloud Operations

Sustainable long-term cost management strategy

Technology Stack

AWS Services

Amazon EC2AWS BatchAWS ParallelClusterAmazon Aurora

3rd-Party Tool

Slurm

About Therapeutics and Biotech

A leading therapeutics and biotechnology company specializing in protein modulation and drug discovery relied on large-scale high-performance computing (HPC) workloads to run batch simulations and data processing pipelines.