Training title: High-Performance Computing (HPC) Visualization

Date: May, 14-15, 2025

Language: English

Event format: hybrid (at Barcelona Supercomputing Center and online)

Video Recording: www.youtube.com/watch?v=XjJ4XQE9gIc

Wednesday, May 14th, 2025

9.00 – 9.30: Participants registration

9.30 – 9.45: Welcome and workshop introduction by Guillermo Marin (BSC)

9.45 – 10.30: Scientific Visualization in the Big Data era: The Square Kilometre Array challenges and ambitions by Giuseppe Tudisco (INAF)

Giuseppe Tudisco holds a Master's degree in Computer Science (Network and Security Systems) from the University of Catania (2020). He is a technologist at the Italian National Institute for Astrophysics (INAF), based at the Astrophysical Observatory of Catania. His expertise spans scientific visualization, data analytics, and distributed computing, leading the development of VisIVO tools. He has contributed to several international projects, including H2020 NEANIAS, ERC ECOGAL, SKA, and the Italian National Centre for HPC, Big Data, and Quantum Computing. Since 2022, he has been actively involved in the development of SKA Regional Centres, serving as Scrum Master for two Agile teams: Orange, focused on visualization prototypes for SKA, and Azure, dedicated to the development of the Italian Regional Centre (ITSRC).

The Square Kilometre Array (SKA) Observatory represents one of the most ambitious scientific projects in astrophysics, designed to address fundamental questions about the origins, structure, and evolution of the universe. With its unprecedented capacity for data generation, the SKA will produce exabytes of information annually, posing extraordinary challenges in data processing, storage, and scientific visualization. Central to this endeavour are the SKA Regional Centres (SRCs), distributed facilities that will play a pivotal role in enabling researchers worldwide to access, process, and analyse these vast datasets efficiently.

This talk will delve into the critical role of the SRCs as the operational backbone of the SKA ecosystem. We will explore their contributions to the pipeline that transforms raw observational data into actionable scientific insights, ensuring that this knowledge can be shared effectively across the global scientific community. Special attention will be given to the challenges of scientific visualization in the Big Data era, including the development of scalable techniques and tools to manage and interpret multi-dimensional datasets.

10.30 – 11.15: SPACE: Scalable Parallel Astrophysics Codes for Exascale by Andrea Mignone (University of Turin) and Luca Tornatore (INAF)

The text emphasizes the critical role of High Performance Computing (HPC) in advancing astrophysics and cosmology, particularly through simulations that help scientists understand complex phenomena. As exascale computing systems become essential for handling the massive data from large-scale observatories like SKA, CTA, and ELT, the SPACE Centre of Excellence (CoE) is focused on adapting key astrophysical codes to these advanced systems.

The importance of visualization and machine learning is highlighted in the talk as tools to manage and analyse the vast outputs of exascale simulations. These techniques help in processing large datasets efficiently and improving the interpretability of simulation results. Machine learning, in particular, aids in enhancing data analysis capabilities, while visualization tools allow for the effective representation and understanding of complex astrophysical data.

11.15 – 11.45: coffee break

11.45 – 12.15: High Performance Visualization with VisIVO by Nicola Tuccari (INAF)

Nicola Tuccari is a PhD student in Computer Science from University of Catania and INAF. He has a master degree in Computer Science. His main research topics are data visualization and parallel computing and he is currently involved in the Italian National Center for HPC, Big Data and Quantum Computing and actively collaborates with the SPACE CoE project.

VisIVO is a suite of graphical applications designed to facilitate data visualization and exploration while providing advanced techniques for analytical visualization. Among these applications, VisIVO Server stands out as a modular platform for visualizing large-scale datasets, capable of deployment across distributed computing infrastructures. Over the years, VisIVO Server has evolved to address the increasing data volumes generated by next-generation observational facilities and large-scale simulation frameworks. Recently, it has been updated to leverage Exascale infrastructures, enhancing its capability to process massive datasets efficiently. The importing module has been optimized for parallel read and write operations, significantly accelerating the conversion of simulation outputs—such as those from OpenGadget—into a format suitable for high-performance visualization. Furthermore, VisIVO Server now supports in-situ visualization through integration with Hecuba, a specialized toolset for managing persistent data in Big Data applications. To enhance its portability and reproducibility, it has also been integrated with Streamflow, a Workflow Management System, enabling the execution of hybrid workflows across cloud and high-performance computing (HPC) infrastructures. Additionally, VisIVO Server has been refactored to improve code readability and make it easier to add new functionalities. These advancements ensure that VisIVO Server remains a powerful solution for large-scale scientific visualization in the Exascale era.

12.15 – 13.00: Using Hecuba to support in-situ visualization by Yolanda Becerra and Enric Sosa (BSC)

Yolanda Becerra is a full-time associate professor at the Computer Architecture Department of the Universitat Politecnica de Catalunya (UPC) and an associate researcher in the Barcelona Supercomputing Center (BSC). She holds a PhD in Computer Science since 2006 from the UPC. In 2007 she joined the BSC, where she is currently leading the Data-Driven Scientific Computing research activity in the Workflows and Distributed Computing group. Her research interests are focused on designing data management policies to improve the performance of scientific applications, and on designing models and interfaces that facilitates to scientist the analysis of their data

Enric Sosa is a junior researcher engineer in the Barcelona Supercomputing Center. In 2020 he received his BSC in Informatics Engineer degree from the Universitat Politecnica de Catalunya (UPC). He is part of the Data-Driven Scientific Computing group since 2018. His research focuses on hardware performance and data models for storage systems.

In this talk, we will introduce Hecuba, a set of tools that facilitates data management for users and programmers. Hecuba implements an object mapper that hides the particularities of data backends. This allows programmers to access data as volatile in-memory objects, using the interface provided by the programming language, although data is persistent or streamed. We will show how Hecuba can help to enable in-situ visualization with little effort from programmers, through a use case based on Paraview and ChaNGA: Paraview visualizes the data as Changa is producing them.

13.00 – 14.30: Lunch

14.30 – 15.30: Scientific discovery using Representation Learning to interpret the largest cosmological simulations by Sebastian Trujillo Gomez and Bernd Doser (HITS)

Sebastian Trujillo Gomez is a theoretical astrophysicist who has dedicated his career to studying the formation of large-scale structure and galaxies in a cosmological context and the nature of dark matter. His research currently focuses on using Deep Learning to enable scientific discovery from the largest simulated and observational datasets, and in particular, developing methods to robustly confront cosmological and galaxy formation models with observational data. He is currently a research scientist in the Astroinformatics group at the Heidelberg Institute for Theoretical Studies in Germany.

Bernd Doser is currently employed as a senior scientific software engineer at the Heidelberg Institute for Theoretical Studies. His responsibilities include implementing modern software engineering practices and maintaining various open-source packages across different scientific disciplines. Bernd holds a Ph.D. in computational chemistry and is an expert in high-performance computing, numerical algorithms, and machine learning.

Numerical simulations are the best approximation to experimental laboratories in cosmology. However, running the simulation is only the first step. Interpreting and analysing the outputs is an essential component of the discovery process, but the large size and high dimensionality of cosmological simulations severely limit the interpretability of their predictions. We will present a new assumption-free approach and tools to maximize scientific discoveries using cosmological simulations. The tools can be applied to today’s largest simulations and will be essential to solve the extreme data access, exploration, and analysis challenges posed by the exascale computing era. Our software tools can run on both local machines and HPC resources. They automatically learn compact representations of complex objects such as simulated or observed galaxies in a low-dimensional space that naturally describes their features. The data is then seamlessly projected onto this representation space for interactive inspection, visual interpretation, sample selection, and local analysis. We will demonstrate the workflow using ~60k simulated galaxies from IllustrisTNG to render an interactive visualization of a morphological similarity space on the surface of a hierarchical sphere designed to handle arbitrarily large simulations containing millions of galaxies. Lastly, we will discuss the potential use of the tool for the robust comparison of simulations with multimodal data from large galaxy surveys, including model selection and simulation-based inference.

15.30 – 16.30: Generalizing ICL Predictions Across Simulations: A Deep Learning Approach by Marta Barroso isidoro and Pablo Agustin Martin Torres (BSC)

Marta Barroso Isidoro has more than four years of experience applying Machine Learning and Deep Learning to diverse research projects, including CIBERES-UCI-COVID and KnowlEdge, where she served as Principal Investigator. She is also skilled in project management, overseeing planning, resource allocation, engineering tasks, and reporting with precision and clarity.

Pablo Agustin Martin Torres works at the intersection between AI and Stochastic Geometry, with more than two years of experience as AI researcher working on visual-language models training and evaluation. Other research interests include the theoretical and practical foundations of Multimodal Generative AI and Mechanistic Interpretability.

We develop a machine learning framework to infer intracluster light (ICL) properties from velocity dispersion maps in simulated galaxy clusters, using deep learning models trained on mock images from multiple hydrodynamical simulations, including DIANOGA, Illustris, Magneticum, MillenniumTNG, and FLAMINGO. By leveraging synthetic data with projection variations, our approach aims to generalize across different simulation environments without dependence on specific physical models. This work presents a simulation-independent method for studying ICL, bridging kinematic and morphological information to provide new insights into its formation and evolution.

16.30 – 17.00: Coffee break

17.00 – 18.00: Wrap up and visit to Marenostrum

Thursday, May 15th, 2025

9.00 – 10.00: Introduction to Data Visualization for researchers by Guillermo Marin (BSC)

Guillermo Marin is the lead of the Scientific Visualization and Storytelling research line at the Visualization Group of the Barcelona Supercomputing Center. His interests are in cinematic data visualization, data conversion pipelines, and high-performance visualization. He has also been a lecturer and trainer in several workshops, Graduate, and Masters's programs over the years. He is currently Associate Professor of Data Visualization at the Universitat Autonoma de Barcelona.

Effective data visualizations can help researchers explore, understand, and communicate their results in ways that are clear, understandable, and impactful. In this regard, visualizations have the potential to enhance the reach and engagement of complex topics. However, researchers and scientists often lack formal training in creating effective visualizations and tend to rely on outdated presets and conventions. This talk will provide the tools and concepts necessary to improve the use of visualizations by introducing fundamental principles and best practices that are applicable to data visualization in general, regardless of the scientific domain or the characteristics of the data.

10.00 – 13.00: Cinematic Visualization of A&C Simulation Data with Blender by Petr Strakoš and Milan Jaroš (IT4I)

Petr Strakoš is a senior researcher and team leader in Visualization and Virtual Reality group at IT4Innovations, VSB-TUO, specializing in high-performance computing, data visualization, and image processing. With a Ph.D. in mechanical engineering from Czech Technical University in Prague, he has contributed to numerous research projects in image processing, biomedical imaging, and simulations with HPC.

Milan Jaroš is a computational scientist specializing in computer graphics and visualization, with a Ph.D. from VŠB - Technical University of Ostrava. As a researcher at IT4Innovations, he focuses on high-performance computing, GPU acceleration, and virtual reality, contributing to cutting-edge projects in image processing and simulations with HPC.

We introduce a visualization workflow aimed at producing cinematic-quality renderings of outputs from A&C simulation codes. This workflow utilizes Blender and its Cycles rendering engine to enhance visualization capabilities. The hands-on session will cover the visualization of SPH data from A&C simulation codes, integrating OpenVDB, geometry nodes, and Cycles shaders to enable a high-fidelity rendering experience.

11.00 – 11.30: coffee break

13.00 – 14.00: Final conclusion and light lunch