Enabling Reproducibility for Small and Large Scale Research Data Sets

Authors: 
Allan Hanbury
Allan Hanbury
Type: 
Electronic journal or CD contribution
Proceedings: 
Publisher: 
Pages: 
Year: 
2017
ISBN: 
Abstract: 
A large portion of scientific results is based on analysing and processing research data. In order for an eScience experiment to be reproducible, we need to able to identify precisely the data set which was used in a study. Considering evolving data sources this can be a challenge, as studies often use subsets which have been extracted from a potentially large parent data set. Exporting and storing subsets in multiple versions does not scale with large amounts of data sets. For tackling this challenge, the RDA Working Group on Data Citation has developed a framework and provides a set of recommendations, which allow identifying precise subsets of evolving data sources based on versioned data and timestamped queries. In this work, we describe how this method can be applied in small scale research data scenarios and how it can be implemented in large scale data facilities having access to sophisticated data infrastructure. We describe how the RDA approach improves the reproducibility of eScience experiments and we provide an overview of existing pilots and use cases in small and large scale settings.
TU Focus: 
Information and Communication Technology
Reference: 

S. Pröll, A. Rauber:
"Enabling Reproducibility for Small and Large Scale Research Data Sets";
D-Lib Magazine, 23 (2017), 1/2; 6 S.

Zusätzliche Informationen

Last changed: 
24.10.2017 12:37:31
Accepted: 
Accepted
TU Id: 
262235
Invited: 
Department Focus: 
Business Informatics
Author List: 
S. Pröll, A. Rauber
Abstract German: