High-Performance Computing at Scale with EESSI Logo

High-Performance Computing at Scale with EESSI

How Inuits is helping bring AMD ROCm support to EESSI — reducing ‘Time-to-Science’ from hours to seconds for researchers across Europe.

High-Performance Computing at Scale with EESSI

Bridging the Gap in Scientific Computing: Bringing AMD ROCm to EESSI

At Inuits, we believe that open-source infrastructure should be a bridge, not a barrier, for the scientific community. Our recent work with the European Environment for Scientific Software Installations (EESSI) highlights how we collaborate with partners like Microsoft and HPC-UGent to streamline High-Performance Computing (HPC).

The Challenge: Closing the “Time-to-Science” Gap

Traditionally, researchers have spent hours or even days configuring complex scientific software stacks before they could run a single simulation. Our goal within the EESSI project is to reduce this “Mean-Time-To-Science” from hours to seconds. This is particularly challenging when supporting diverse hardware, as each CPU architecture must be matched with specific GPU compute capabilities.

Our Contribution: Active Community Engagement

We don’t just implement these solutions; we are deeply involved in the underlying software ecosystem. In 2024, Inuits contributed approximately 13% of all Easyconfig Pull Requests to the EasyBuild project. Key contributors from our team managed 181 PRs and 155 PRs.

As featured speakers for EESSI, our consultants recently demonstrated how this expertise translates into practical cloud deployments.

The Solution: Standardizing GPU Support

Our team focused on making GPU-enabled software “plug-and-play” across different environments, including NVIDIA CUDA and AMD ROCm. By leveraging Azure VM snapshots, we developed a workflow that allows researchers to jump directly into their work:

  • Post-Driver Snapshots: Environments with pre-installed drivers ready for immediate use.
  • Post-CUDA Snapshots: Ready for specialized software builds.
  • Ready-to-Run Snapshots: Delivering instant science capability.

The Results: Performance at Scale

Our tests across various Azure VM types—from the NC T4 to the ND A100—confirmed that these optimized environments significantly boost research productivity.

  • Deployment Speed: We achieved a “Time-to-Science” of just 18 to 20 seconds for complex simulations.
  • Processing Power: Scientific applications like GROMACS ran 4–5× faster on GPUs, while ESPResSo saw speed increases of 10–13×.
  • Stability: Our GPU-enabled stacks provided consistent performance, even resolving stability issues sometimes found in CPU-only executions.

Looking Ahead

Following our work on the NVIDIA stack, we are now focusing on enhancing support for the AMD ROCm ecosystem. This includes mapping software dependencies and developing automated build recipes to ensure that researchers have a consistent experience regardless of their hardware choice.