The University of Western Australia - International Centre for Radio Astronomy Research (ICRAR)

Miscellaneous Information

Miscellaneous Information

Abstract Reference: 30342
Identifier: P3.12
Presentation: Poster presentation
Key Theme: 3 New Trends in HPC and Distributed Computing

Imaging SKA-Scale Data on Cloud and Supercomputer Infrastructure using Drops and a Data Flow Management System

Vinsen Kevin, Dodson Richard, Dolensky Markus, Dave, Pallot Tobar Rodrigo, Wicenec Andreas, Chen Wu, Boulton Mark

At ADASS XXV we presented the Drop concept and our investigation into managing the processing of Square Kilometre Array (SKA) scale data volumes. In this paper we present the results of our ongoing effort to develop a Drop-based data flow management system, called Daliuge, that can scale to the expected size of the SKA phase 1, using either cloud or HPC infrastructure. It is a data driven execution framework compliant with the SKA1 system requirements and architecture. It is scalable and its pluggable scheduling routines allow workload optimization according to various criteria such as data locality and hardware configuration. Daliuge is a functioning software system capable of executing a directed acyclic graph (DAG) across many hundreds of computational nodes within a cloud or HPC environment. In addition we present a logical graph editor, and a static scheduler that translates the logical graphs to the complex physical graphs (expressed as DAGs) used by the Daliuge. Most Radio Astronomy pipelines involve iterative algorithms that require repeated executions in loops; one key feature of the Graph Editor is that it can convert user-oriented cyclic graphs into DAGs by unrolling these iterative loops. Moreover, the graph scheduler splits the high level logical graph into chunks and distributing them across the nodes of the system such that the cost of data movement or resource utilisation is minimised, making it significantly more efficient to deploy and run DAGs on heterogeneous hardware resources. To provide real life test scenarios we have continued our testing using the CHILES data and we have introduced ASKAP, MWA and LOFAR data sets to ensure different approaches to radio astronomy pipelines can be accommodated by the software. In addition we have used simulated SKA data to increase the data size to closer to SKA phase 1 scale. Daliuge is capable of using drops that are based on: files, directories, AWS S3 files, streams, NGAS, memory, databases (both SQL and NoSQL), Linux programs, Python programs, Docker containers, and Bash scripts. The results are extremely encouraging and have shown that the DROP concept is extremely flexible, and that the Daliuge software is very scalable. The nature of the DAG means we are finding that the need to move data is greatly reduced, a key requirement for systems work at the scale of SKA phase 1. The software has been written in Python to run under Linux. All the software is open source and available from GitHub.