DIACHRON: Preserving the Evolving Data Web: Making Open / Linked Data Diachronic

DIACHRON intends to address and cope with certain issues arising from the evolution of the data such as: (a) Monitor the changes of LOD datasets (tracking the evolution); (b) Identify the cause of the evolution of the datasets in respect with the real world evolution of the entities the datasets describe (provenance problem); (c) Repair various data deficiencies (curation problem); (d)Temporal and spatial quality assessment of the harvested LOD datasets and determination of the datasets versions that need to be preserved (appraisal); (e) Archive multiple versions of data and cite them accordingly to make the reference of previous data feasible (archiving and citation); (f) Retrieve and query previous versions (time traveling queries) The DIACHRON solution aims not only to store previous versions for preservation in case of future need of them, but to create a live repository of the data that captures and highlights data evolution by keeping all data (current and previous) accessible, combined with a toolset that handles the full life cycle of the Data Web.


The Web has not only caused a revolution in communication; it also has completely changed the way we gather and use data. Open data -- data that is available to everyone -- is exponentially growing, and it has completely transformed the way we now conduct any kind of research or scholarship; it has changed the scientific method. The recent development of Linked Open Data has only increased the possibilities for exploiting public data. Given the value of open data how do we preserve it for future use? Currently, much of the data we use, e.g. demographic records, clinical statistics, personal and enterprise data as well as many scientific measurements cannot be reproduced.

However, there is overwhelming evidence that we should keep such data where it is technically and economically feasible to do so. Until now this problem has been approached by keeping this information in fixed data sets and using extensions to the standard methods of disseminating and archiving traditional (paper) artifacts. Given the complexity, the interlinking and the dynamic nature of current data, especially Linked Open Data, radically new methods are needed. DIACHRON tackles this problem with a fundamental assumption: that the processes of publishing and preservation data are one and the same. Data are archived at the point of creation and archiving and dissemination are synonymous.

DIACHRON takes on the challenges of evolution, archiving, provenance, annotation, citation, and data quality in the context of Linked Open Data and modern database systems. DIACHRON intends to automate the collection of metadata, provenance and all forms of contextual information so that data are accessible and usable at the point of creation and remain so indefinitely. The results of DIACHRON are evaluated in three large-scale use cases: open governmental data life-cycles, large enterprise data intranets and scientific data ecosystems in the life-sciences.