Beyond the cloud: handling on-demand workloads with VOLTA elastic distributed execution
Written by Matteo Gazzin
6 November 2020 · 5 min read
Over the last two decades, simulation and design optimization have become key to developing better products while reducing costs and time to market. This process is continuously evolving, increasing the level of detail of analysis and integrating multiple disciplines. The end goal of this process is multidisciplinary optimization (MDO), where engineers with different domain expertise contribute to finding the overall optimum system solution.
Nowadays, the challenges of multidisciplinary design and optimization include collaboration, process automation, execution, and data analytics. To fulfill these requirements, medium and large companies need, at the very least, a common or shared repository infrastructure and a complex and powerful hardware environment where engineers can manage data and execute simulations.
From the execution perspective, interest is growing in Software as a Service (SaaS), or on-demand cloud platforms, to integrate in-house computing resources. Coupling and integrating these services, while preserving security and intellectual property, are crucial aspects of the execution environment.
From a user perspective, being able to take advantage of different solvers and tools in MDO is also pivotal. Users need to create a process automation workflow where simulations are computed and data, results and files are exchanged between the different tools to compute the overall performance of the system. At the same time, traceability and reproducibility of generated results must be guaranteed in all steps and for each discipline.
High performance computing (HPC) to cut simulation execution time
From the execution point of view, all these aspects are managed by the IT department and the complexity of creating, configuring, and maintaining computing resources is usually hidden from end-users. The main issues to address are software availability and the management of different versions, security access levels, and licenses. High-Performance Computing (HPC) systems help IT departments to group computing resources into a single, powerful cluster of machines centrally managed. It enables engineers to cut the execution time of heavy simulations from weeks to hours.
Tools like modeFRONTIER, ESTECO’s desktop application, enable engineers not only to link all the analysis software required for the design study, whether they are run on HPC or on standalone workstations but also to perform design space exploration and optimization studies.
VOLTA, the web-based Simulation Process and Data Management (SPDM) solution developed by ESTECO, enables the collaboration between experts of a variety of disciplines and tools required for designing complex systems. The data management module enables users to easily store and access resources, manage personal items, share content, save data and results. Process automation workflows drive single or multidisciplinary design explorations, and optimization runs, created by experts and available to other users. Moreover, in the data analysis and post-processing environment, results can be analyzed and final decisions can be made collaboratively.
HPC meets VOLTA distributed execution
The backbone of the system is a distributed execution environment that consists of a dynamic network of execution servers, organized into queues. Execution queues are managed by the VOLTA System Administrator and are assigned to specific users or teams. Computing resources, such as clusters or clouds, are dynamically assigned to queues by connecting the VOLTA Players: distributed, multi-platform run-time engines that execute VOLTA projects. Fine-grained configuration settings enable the assignment of specific permissions to users and groups and the definition of specific hardware limitations and software availability. This approach has two main advantages: users have multiple available resources and the IT department maintains ownership of computing resources, thereby complying with company access regulations.
Even if the concept may look similar to what an HPC system does, the VOLTA distributed execution environment is not meant to replace HPC systems. Its purpose is to integrate with them and simplify access to them.
Even if the concept may look similar to what an HPC system does, the VOLTA distributed execution environment is not meant to replace HPC systems. Its purpose is to integrate with HPC systems and simplify access to them.
The next step is using computing resources efficiently to dynamically allocate and deallocate resources to adapt to computational workloads. VOLTA Elastic Distributed Execution is an enhancement of the existing environment that leverages all these aspects to enable the efficient use of computing resources, from local workstations to company HPCs or private and public cloud environments, by using Docker containers orchestrated in a Kubernetes cluster. Containers are self-contained environments, which can be easily cloned, modified, destroyed, and saved without impacting complex systems such as HPC infrastructures.
This enables the use of several software versions, if required, without having to deal with difficult upgrades or version selection. Containers for new software can be easily tested in pre-production environments and promoted to the production level without having to perform complex and error-prone operations twice.
At the same time, users would not have to deal with this complexity. The key aspect of the VOLTA Elastic Distributed Execution environment is that users are not responsible for activating containers before submitting an execution request. The system takes care of this automatically. Once the task is requested, VOLTA checks the process automation workflow and analyzes the integration software and specific requirements. If the requirements for the execution are met, VOLTA automatically starts the containers that are needed for execution. Once the task is concluded, the containers are shut down, freeing up computing resources for other tasks.
Given the benefits from an IT point of view, this approach also impacts two aspects in MDO engineering simulation: traceability and reproducibility. VOLTA has a complete traceability schema for each component used in the generation of results: workflows, strategies, models, and files. For each of these, users can easily track which version has been used when results have been generated. With the elastic environment execution resources are also traced. This helps users, even after years, to easily reuse or recreate the environment used for data generation and, with the information regarding files and models, reproduce results and data.
White paper