Object-based multi level file caching for massively parallel supercomputers

Francisco Javier Garcia Blas
Seminar

Recent advances in storage technologies and high performance interconnects have made possible in the last years to build, more and more potent storage systems that serve thousands of nodes. The majority of storage systems of clusters and supercomputers from Top 500 list are managed by one of three scalable parallel file systems: GPFS, PVFS, and Lustre. Parallel applications currently suffer from a significant imbalance between computational power and available I/O bandwidth. Additionally, the hierarchical organization of current Petascale systems and of the envisioned Exascale platforms contributes to an increase of the I/O subsystem latency. In these hierarchies, file access involves pipelining data through several networks with incremental latencies and higher probability of congestion.

We present a novel generic parallel I/O architecture for both clusters and supercomputers. Our design is aimed at large-scale parallel architectures with thousands of compute nodes. Besides acting as middleware for existing parallel file systems, our architecture provides on-line virtualization of storage resources. Our solution is based on a multi-tier cache architecture and asynchronous data staging strategies hiding the latency of data transfer between cache tiers. This work targets to reduce the file access latency perceived by the data-intensive parallel scientific applications by multi-layer asynchronous data transfers. In order to accomplish this objective, our techniques leverage the multi-core architectures by overlapping computation with communication and I/O in parallel threads. Prototypes of our solutions have been deployed on both clusters and Blue Gene supercomputers.

Bio:
Francisco Javier Garcia Blas has been a Teaching Assistant of University Carlos III (Spain) since 2005. He has also cooperated in several projects with researchers from various high performance research institutions including HLRS (funded by HPC-Europe program) and Argonne National Laboratory. He is currently involved in various projects on topics including parallel I/O and parallel architectures. He received the MS degree in Computer Science of University Carlos III of Madrid in 2007. In 2010 he received a PhD in Computer Science from University Carlos III.