Proceedings of the 6th International Conference on Data Science, Technology and Applications - Volume 1: DATA
In Rail Automation, planning future projects requires the integration of business-critical data from heterogeneous data sources. As a consequence, data quality of integrated data is crucial for the optimal utilization of the production capacity. Unfortunately, current integration approaches mostly neglect uncertainties and inconsistencies in the integration process in terms of railway specific data. To tackle these restrictions, we propose a semi-automatic process for data import, where the user resolves ambiguous data classifications. The task of finding the correct data warehouse classification of source values in a proprietary, often semi-structured format is supported by the notion of a signifier, which is a natural extension of composite primary keys. In a case study from the domain of asset management in Rail Automation we evaluate that this approach facilitates high-quality data integration while minimizing user interaction.
Information and Communication Technology