FAIR Data Spaces Demonstrators

Demonstrators

Since late 2021, FAIR Data Spaces has been developing demonstrators for a biodiversity data space, research data quality assurance, and cross-platform data analytics. In these demonstrators, existing and new infrastructure and data services play together according to the Gaia-X and FAIR principles.

More information about the technical implementations can be found here.

For example, three subcontractors from industry were selected in a tender process to further develop these demonstrators and build new ones. Accurids manages distributed data with centralized data management, Geo Engine is a cloud-enabled data science platform for spatio-temporal data processing, and expandAI enriches standard applications with AI.

More information on these new projects can be found in our July newsletter.

The long-term demonstrators are listed below.

Demonstrator on Biodiversity

In collaboration with Geo Engine GmbH, a Gaia-X compatible demonstrator based on Geo Engine is being developed. Geo Engine is a cloud-based research environment that connects data sources and provides researchers with the ability to process spatiotemporal data interactively and visually. In the FAIR-DS demonstrator, scalable access to data provided by NFDI4Biodiversity in a cloud is supported. This is done based on the Gaia-X Cloud specifications, which provide both technical and legal frameworks for data exchange. In the first use case in FAIR Data Spaces, data from industry (satellite data) is combined with data from science (GFBio). More information on the Geo Engine software component is available here.

Demonstrator for FAIR Research Data Quality Assurance and Workflows

The purpose of this demonstrator is to show the use of decentralized task runners for automated quality control and data assurance in a widely available or easily deployed environment. In doing so, the demonstrator uses the workflow engine provided by the source code hosting platform GitLab to analyze, transform, and verify research data artifacts. Based on given schema data, the demonstrator analyzes newly added data for compatibility and provides a warning if violated. An incompatible dataset can thus be quickly cleaned up and then smoothly integrated into existing datasets. More information.

Demonstrator for cross-platform data analysis

The goal of this demonstrator is to reuse the current results of NFDI (in particular NFDI4Health) and MII in terms of medical data structures, formats, and ethical and legal requirements, while also being compatible with Gaia-X specifications. To this end, a cross-platform data analytics infrastructure called Personal Health Train (PHT) will be used. The key elements of the PHT ecosystem are the so-called Trains and Stations, an analogy to trains and stops. Trains encapsulate analytics tasks using container technologies. Trains contain all the requirements to query the data, run the algorithm, and store the results. Stations act as data providers that manage data sets. To analyze the decentralized data, a specific train is transmitted to each station in turn. The train performs the analysis task and computes the results (e.g. statistics) based on the locally available data. More information is available here.

ELSA Training for Data Scientists

The goal of this task is to develop a set of core elements for ELSA training, i.e., training regarding ethical, legal, and social aspects for data scientists. Training on ELSA topics can thus identify and mitigate potential relevant challenges that may arise during the phases of a data science project.

The first step was to analyze and describe the current ELSA training landscape as manifested in existing policies, courses, programs, and curricula. In addition, the profile of existing and future Data Scientists was described. The results were presented in the first FAIR-DS ELSA workshop and are part of the publications in the FAIR Data Spaces Community in Zenodo.

In a series of in-depth workshops in 2022 with a variety of domain experts, both from within and outside FAIR DS, material was collected for a first version of an ELSA curriculum for Data Scientists. Currently, based on this material, the first curriculum version is being completed. In a next step, the curriculum will then be submitted to Gaia-X industry partners and the broader community for comment. Based on this, an ELSA curriculum will be proposed as an outcome of this work package.

Further work packages

Roadmapping and Community

Legal and Ethical Framework

Technical Foundations

Grant agreement FAIRDS