A first approach to an ELSA Curriculum for Data Scientists
The FAIR Data Spaces Project as a Use Case
- When? Wednesday, 14 June 2023, 2-4 PM
- Where? Online
- Language: English
The workshop is organized within the framework of the BMBF-funded project “FAIR Data Spaces” and is a follow-up of a first ELSA Workshop that took place last year. The workshop aims to present the first version of a proposed ELSA Curriculum for Data Scientists, as well as present the FAIR Data Spaces demonstrators as Use Cases that can be used in the implementation of such a Curriculum, showcasing the multifaceted nature of the project as an interdisciplinary endeavour.
The workshop will be kicked off by presentations and lightning talks from participants of the FAIR Data Spaces project. Following up, there will be an invited talk introducing a practical framework that helps teams build trutworthy AI systems and data strategies by combining expertise and training from philosophy, law, machine learning and design.
The workshop will conclude with an open discussion round. Audience participation is highly encouraged.
1. Welcome-Introduction (Daniela Mockler, NFDI)
2. Towards an ELSA Curriculum for Data Scientists – A first approach (Maria Christoforaki, UzK)
- Presentation of the general concept and the proposed content of an ELSA Curriculum for Data Scientists; we also will discuss implementation issues of the curriculum application such as program duration, means of content delivery, and evaluation methods.
3. The FAIR Data Spaces Demonstrators as ELSA Curriculum Use Cases
- Workflow-Based Spatio-Temporal Data Analytics (Nikolaus Glombiewski, University of Marburg)
- In the biodiversity domain, researchers often have to combine a large variety of heterogeneous spatio-temporal data sources. For example, the loss of biodiversity can be quantified by analyzing occurrence observations of various species across time. To find the root cause of that loss, occurrence data may need to be combined with satellite images to find possible correlations with climate variables. To facilitate an exploratory approach for this combination of data sources, it is essential to provide researchers with workflow-based tools such that each step during the formulation of a research hypothesis can be tracked. In this presentation, we will discuss Geo Engine, a workflow-based analysis platform for spatio-temporal data analytics, and its place within FAIR data spaces.
- Automated Data Quality Assurance with GitLab pipelines and Docker (Jonathan Hartman, RWTH Aachen)
- The Data Validation and Quality Assurance demonstrator is a tool designed to assist with collaborative research projects, ensuring that collected data complies with established schemas and monitoring these files for unusual or unexpected values. This demonstrator attempts to show how a task of this nature can be attempted in a scalable, cloud based infrastructure, while maintaining compatibility with existing frameworks. In this presentation we will cover a brief use case, showing the results of the demonstrator on an example dataset that contains some intentionally added data quality items
- Unlocking the Potential of Health Data: A Distributed Analysis Approach based on Personal Health Train Infrastructure (Macedo Maia, Leipzig University)
- While there is a great availability of medical datasets, they are usually focused on a specific research question. This is useful for making experiments transparent and reproducible, however, these datasets can be more efficiently used in other kinds of analyses, where it not for data privacy issues. The Personal Health Train (PHT) provides a distributed analysis infrastructure that follows the FAIR principles and gives control to the data owners (providers) about how their data are used by scientists or other users (consumers).
4. Data Privacy and Intellectual Property issues illustrated by the FAIR Data Spaces Demonstrators
- GDPR and the principle of purpose limitation in connecting Data Spaces demonstrators (Constantin Bress, FIZ Karlsruhe)
- The presentation focuses on legal challenges arising from the GDPR when connecting Data Spaces using the Demonstrators. Special attention is given to the principle of purpose limitation of the GDPR. The Demonstrators serve a crucial role in connecting the data spaces of the NFDI and GAIA-X. The data in the two data spaces is gathered and used for usually completely different purposes. Thus, when connecting the spaces, the data may be processed for completely different purposes. The GDPR – when applicable – limits legitimate purposes of such further processing. Even if a purpose is legitimate, the GDPR imposes different obligations on controllers and processors. The presentation will dive into the applicability of the GDPR to the demonstrators, the roles of controller and processor regarding the demonstrators and will discuss the principle of purpose limitation and the obligations connected to it.
- Research Data Quality Assurance – an intellectual property perspective (Jonas Kuiter, University of Münster)
- The presentation will summarize the intellectual property issues, regarding the second Demonstrator in the FAIR Data Spaces project, Research Data Quality Assurance. It will especially outline the specific copyright law paragraphs, which may be and are important for this demonstrator and the participants in the project.
5. Towards a practical framework to „ethics by design“ data sharing and machine learning applications (Jona Boeddinghaus, DAIKI)
- Responsible AI and data-driven applications can only be developed when teams integrate the ethical principles directly into the development process. An important prerequisite is the involvement of a diverse group of stakeholders who build and are affected by AI and data systems. We present a practical framework that helps teams build trustworthy AI systems and data strategies by combining expertise and training from philosophy, law, machine learning and design.
6. General discussion / Q&A
- Maria Christoforaki is a computer scientist who works at the Institute of Biomedical Informatics of the University Hospital of Cologne. In the framework of the FAIR DS project she is responsible for coordinating the development of an ELSA curriculum for data scientists.
- Nikolaus Glombiewski is a researcher at the database research group at the University of Marburg, where he also received his Master’s degree in computer science. Currently, he is working on spatio-temporal data processing in FAIR data spaces.
- Jonathan Hartman is currently a data analyst and developer for the Research Process and Data Management department at RWTH Aachen University. Prior to this, he worked as an analyst and project consultant at the Ford School of PublicPolicy’s „Education Policy Initiative“ at the University of Michigan in the US, where he also received his Master’s degree in Data Analytics.
- Macedo Maia is a PhD student in Artificial Intelligent (AI) Systems and Data Scientist at Medical Data Science (MDS) group at Leipzig University. He collaborates with the FAIR Dataspaces (FAIR-DS) project and works with distributed data analysis and the application of intelligent systems in healthcare.
- Constantin Breß is a legal researcher at the intellectual property rights department at FIZ Karlsruhe. Currently he is focusing on the application of the GDPR on technologies for connecting data spaces and data trust models. Constantin earned his Staatsexamen in law at the University of Passau, Germany.
- Jonas Kuiter is currently a research assistant at the WWU Münster at the ITM, an institute for intellectual property, entertainment law and telecommunications law under management of Professor Dr. Thomas Hoeren. Within this framework, he is active in the FAIR DS project. Prior to this he received his law Degree at the University of Hannover, where he also worked in the legal department of an insurance company.
- Jona Boeddinghaus is co-founder and managing director of multiple software and machine learning companies. With over 20 years of experience in software development and artificial intelligence he is an expert in software and data architecture, machine learning and AI ethics. Jona earned his Magister Artium degree in Philosophy (master thesis about the philosophy of mind), Computer Science, and Cognitive Science from the University Freiburg, Germany. He is an experienced consultant and CEO of DAIKI