ESAC - European Space Astronomy Center, European Space Agency, Spain

Miscellaneous Information

Miscellaneous Information

Abstract Reference: 30763
Identifier: O1.2
Presentation: Oral communication
Key Theme: 4 Long-term Management of Data Archives 

The EUCLID archive system: a data-centric approach to big data

Nieto Sara, de Teodoro Pilar, Salgado Jesus, Buenadicha Guillermo, Altieri Bruno, Hoar John, Andrey Belikov, Williams Owen, Boxhoorn Danny, Droege Bob, McFarland John, Tsyganov Andrey, Valentijn Edwin, Daban Christophe, 

Euclid is the ESA M2 mission and a milestone in the understanding of the geometry of the Universe. Euclid faces two main challenges from the point of view of the data processing. Firstly, the unprecedented accuracy which must be achieved in order to meet the scientific goals. Secondly, the mission will depend heavily on the processing and reprocessing of ground-based data which will form the bulk of the stored data volume. In total Euclid will produce up to 26 PB per year of observations. The Euclid Archive System (EAS) is in the core of the Euclid Science Ground Segment. It supports the processing and storage of Euclid data from the raw frames to the creation of science-ready images and catalogues.


The Euclid Archive System consists of three components. The Data Processing System (DPS) provides a centralized metadata storage system to support data processing while the Distributed Storage System (DSS) stores the data files. Regarding the long term preservation, the EAS will provide access to the most valuable scientific metadata through the Science Archive System (SAS). The SAS is being built at the ESAC Science Data Centre (ESDC), which is responsible for the development and operations of the scientific archives for the Astronomy, Planetary and Heliophysics missions of ESA. The SAS is focused on the needs of the scientific community and it will provide access to the most valuable scientific metadata coming through a set of public data releases.


The DPS implements the object-orientated Euclid Common Data Model which describes both the scientific data (data products generated by pipelines) and the processing/operational metadata. The latter includes the processing and data distribution orders, location of the file in the DSS and processing plans. The content of the DPS is mapped to the SAS which implements the relational Science Exploitation Data Model, optimised for use in scientific exploration.


We review the architectural design of the system, implementation progress with tests and the main challenges in the building of the EAS.