National Astronomical Observatories, Chinese Academy of Sciences

Miscellaneous Information

Miscellaneous Information

Abstract Reference: 31043
Identifier: P8.28
Presentation: Poster presentation
Key Theme: 8 Other

Astronomical big data testing of astronomical data analysis software and systems

Zhou Lixiao

Big data is growing at a rapid pace, and one of the present hot trends in technology is the topic of big data and products targeting the big data problem. According to IBM, 90% of the world’s data has been created in the past 2 years, and with big data comes bad data. Seventy of enterprises have either deployed or are planning to deploy big data projects and programs at present according to analyst firm IDG. Nineteen point two of big data application developers say quality of data is the biggest problem they consistently face according to Evans Data Corporation. Analyst firm Gartner says the average organization loses $14.2 million annually through poor data quality. Experian Data Quality report states 99% of organizations have a data quality strategy in place and 75% of businesses are wasting 14% of revenue due to poor data quality. Astronomy has been one of the first areas of science to embrace and learn from big data. The amount of data we have on our universe is doubling every year thanks to big telescopes and better light detectors. Most leading research is based on data from a handful of very expensive telescopes located in deserts, on mountain tops or on satellites in near-earth space. For example, with a modest cost of around $100 million and still thriving today, the SDSS telescope produces about 200GB of data every night, adding to a database that stands at around 50TB today. The scientific impact of the SDSS has been phenomenal: over 3000 papers on topics ranging from comets to cosmology, generating today over 160000 citations. NASA’s Hubble Space Telescope (HST) has had roughly 3 times the impact, but at roughly 100 times the cost. Undoubtedly, the data quality is the key basis for the leading scientific findings. With the rapid development of space astronomical science, there are so many astronomical satellite projects in operation or in plan. Then, how do we measure the quality of data, particularly when it is unstructured or generated through statistical processes? How do we confirm that highly concurrent systems do not have deadlock or race conditions? What tools should be used? It is imperative that software testers understand that big data is about far more than simply data volume. This paper will analyze and summarize definition, characteristics, process, types, methods, strategy, tools, problems and challenges for both general big data testing and astronomical big data testing of astronomical data analysis software and systems.