PASC Conference
  • RSS
PASC24 Conference: June 3 to June 5, 2024
  • Home
  • About
    • Organization
  • PASC25 News
  • PASC24 News
  • PASC23 News
  • PASC22 News
  • PASC21 News
  • Older editions
    • PASC20 News
    • PASC19 News
    • PASC18 News
    • PASC17 News
    • PASC16 News
    • PASC15 News
    • PASC14 News
  • Home
  • PASC18 Conference
  • PASC18 – Video of Leonardo Bautista Gomez on Easy and Efficient Multilevel Checkpointing for Extreme Scale Systems

PASC18 – Video of Leonardo Bautista Gomez on Easy and Efficient Multilevel Checkpointing for Extreme Scale Systems

In this video from PASC18, Leonardo Bautista Gomez from the Barcelona Supercomputing Center presents: Easy and Efficient Multilevel Checkpointing for Extreme Scale Systems.

“Extreme scale supercomputers offer thousands of computing nodes to their users to satisfy their computing needs. As the need for massively parallel computing increases in industry, computing centers are being forced to increase in size and to transition to new computing technologies. While the advantage for the users is clear, such evolution imposes significant challenges, such as energy consumption and reliability. In this talk, we will discuss how to guarantee high reliability to high performance applications running in extreme scale supercomputers. In particular, we cover the tools necessary to implement scalable multilevel checkpointing for tightly coupled applications. This includes an overview of failure types and frequency in current HPC systems. The talk will also cover the theoretical analysis necessary to achieve optimal utilization of the computing resources. Moreover, we will discuss the internals of the FTI library tool, to study how multilevel checkpointing is implemented today.”

Thanks to Rich Brueckner from insideHPC Media Publications for recording the video.

Categories

Next conference

Next conference

Conference Co-Sponsors

Conference Co-Sponsors
© 2025 PASC Conference