Buss Jens
Contact
- Position:
- TU Dortmund - Astroparticle Physics
- Address
- Germany
Miscellaneous Information
- Miscellaneous Information
-
Abstract Reference: 30557
Identifier: O12.4
Presentation: Oral communication
Key Theme: 3 New Trends in HPC and Distributed Computing
FACT-Tools (Demo) – Processing High-Volume Telescope DataAuthors:
Buss Jens, Bockermann Christian, Rhode Wolfgang, RuheTim, Morik Katharina, Adam Jan, Max Ahnen,Several large experiments such as MAGIC, FACT, VERITAS, HESS or the upcoming CTA project deploy high-precision Cherenkov telescopes to monitor celestial objects. The First G-APD Cherenkov Telescope (FACT) is pioneering the use of solid state photo detectors for imaging atmospheric Cherenkov telescopes. Since October 2011, the FACT collaboration has successfully been showing the application and reliability of silicon photo multipliers for earth-bound gamma-ray astronomy.
The amount of data collected by modern Cherenkov telescopes poses big challenges for the data storage and the data analysis. The challenges range from domain specific physics aspects, such as finding good filtering algorithms/parameters for background rejection, to scalability issues, requiring analytical software to be scaled to large clusters of compute nodes for an effective real-time analysis.Modern cluster environments, which emerged from the Big Data community, aim at distributed data storage with a strong emphasis on data locality and fault-tolerant computing. These clusters perfectly match the requirements of modern data-driven physics experiments. However, their programming demands expert knowledge to gain the full performance advantages at the user level.
In a joint effort of physicists and computer scientists we targeted this area of conflict using the generic streams framework, a pluggable data processing environment developed at the Collaborative Research Center SFB-876. Using streams allows for the high-level design of analytical data flows, while maintaining compatibility to large scale streaming platforms. This enables physicists to develop and test new algorithms in a local environment and deploy their solutions on modern compute clusters without adaptions.
Using the \streams framework, we built a processing library for designing a data pipeline for the FACT telescope. The resulting FACT Tools provide a rapid-prototyping environment for the development of any data processing stage required within FACT. The toolsuite supports reading raw camera output and applying various data cleaning and feature extraction stages. The integration of popular machine learning libraries additionally supplies smart filtering of relevant events to suppress background noise. The abstract modelling of data pipelines allow for an efficient data processing on large scale clusters within the Apache Hadoop ecosystem.
Abstract Reference: 31323
Identifier: D2
Presentation: Demo Booth organisational
Key Theme: 3 New Trends in HPC and Distributed ComputingFACT-Tools (Demo) – Processing High-Volume Telescope Data
Authors:
Buss JensThe First G-APD Cherenkov telescope (FACT) is an Imaging Atmospheric Cherenkov Telescope (IACT) located on the island of La Palma near the summit of the Roque del los Muchachos. FACT's purpose is the monitoring of bright TeV Blazars in the northern sky in order to detect states of high activity of these sources. Moreover, it is pioneering the use of Silicon Photomultipliers in gamma-ray astronomy.
In order to coordinate multi-wavelength campaigns, other experiments need to be alerted quickly in case of flaring sources. At the telescope site, a real-time data analysis is running which performs a raw data analysis and online application of machine learning algorithms to distinguish signal events from background events. A web interface provides visualizations of the data using the D3 JavaScript library. It displays the telescope status, data rates and analysis results in real time. The high performance requirements for the real-time analysis are met by use of a distributed data streaming engine called the streams-framework. It is a modular data streaming environment working in conjunction with popular Big Data solutions for distributed computing like Apache Spark. The streams-framework explicitly models the data and control flow as a graph using edges and nodes. These can be visualized by a dashboard like graphical user interface.
In addition to the real-time analysis of the telescope data, offline computations on the telescope data have to be done, e.g., to test new reconstruction algorithms on older data or to apply advanced analysis techniques. Some of these tasks can, computationally, be too expensive to be done in the real-time case with the limited resources on the Roque de los Muchachos. At this point, the interoperability of the streams framework with modern large-scale big data frameworks like Apache Hadoop or Spark is a big advantage.
This demo will show how the data analysis pipeline for FACT's data is setup with the streams-framework. Progress and state can be monitored via a dashboard which also visualizes the data flow. Moreover, the demo will show a mockup of the running system as well as the mentioned status website with real-time analysis results.