Derived Text Formats (ATF): Research Beyond Copyright Barriers (Text+)

Copyright law often permits scientific analyses of large, contemporary text collections, but it blocks many of the open‑science practices that make research transparent, reproducible, and reusable. Derived Text Formats (DTF) solve this problem being automatically transformed versions of the original texts in which copyright‑protected material has been removed, yet the parts that are needed for e.g. Digital Humanities (DH) and Natural Language Processing (NLP) remain intact. The transformed texts can then be shared freely with other scientists.

How DTF are built

Four basic operations are used, each of which can be applied at different levels of granularity and scope (e.g. a single word, a whole sentence, a paragraph, an entire work, or an entire corpus):

Operation	What it does	Example use
Delete	Removes selected pieces of text.	Delete all spoken lines from a play; the remaining text is no longer protected, but can still be used for network analysis or language modelling.
Substitute	Replaces selected pieces with something else (e.g., a placeholder).	Replace every proper name with “NAME” for anonymisation.
Retain	Keeps only the parts that are needed and discards everything else.	Keep only the count of each word (“bag‑of‑words”); this list of frequencies can be used for authorship attribution.
Randomise	Changes the order of larger units such as sentences.	Randomly shuffle the sentences of a large corpus; if the corpus is big enough and the individual sentences are not themselves protected, the resulting text is considered copyright‑free.

Why DTF matter

By providing a structured, legally safe representation of texts, DTF allow researchers from linguistics, digital humanities, language technology, and any other field work with the data they need without violating copyright. In short, DTF make it possible to share and reuse text‑based research material while staying within the law.

More here:

https://text-plus.org/en/themen-dokumentation/atf/

Humanities@NFDI: Working Together for Sustainable Research Data

by Kall Kefle | May 29, 2026 | Collaboration, Cultural change, Humanities and Social Sciences, Success Story, Support

Cross-Disciplinary Collaboration for Preserving Cultural Heritage

Humanities@NFDI brings together four NFDI consortia to ensure the long-term accessibility and reuse of research data in the humanities and cultural sciences. Through shared standards, vocabularies, and community-driven activities, the initiative fosters interdisciplinary collaboration and strengthens digital cultural heritage research.

QualidataNet by KonsortSWD-NFDI4Society is the “central point of entry” for qualitative data and its secondary use.

by Kall Kefle | May 12, 2026 | Collaboration, Humanities and Social Sciences, Infrastructure, Success Story, Tools, Training & Education

QualidataNet – Making Qualitative Research Data Visible and Reusable
QualidataNet is the central access point for the reuse, archiving, and research data management of qualitative research data. Its search portal improves the visibility and discoverability of qualitative datasets from different providers. Through practical guidance, tools such as the open-source anonymization tool QualiAnon, and contributions to international metadata standards, QualidataNet supports researchers, educators, and institutions working with qualitative data. At the same time, the network fosters collaboration, exchange, and a stronger culture of qualitative data reuse across the community.

Forum4MICA – Making Information Commonly Available (KonsortSWD I NFDI4Society)

by Kall Kefle | May 12, 2026 | Humanities and Social Sciences, Infrastructure, Success Story, Support

Forum4MICA – Making Research Data Knowledge Accessible Together
Forum4MICA connects researchers and research data centers on one central platform. It provides a space to ask questions, exchange expertise, and discuss complex datasets from the social, behavioral, educational, and economic sciences. Through direct interaction with experts and the research community, the platform is building a sustainable knowledge archive for research data management and scientific collaboration.

Other posts

Search

Recent Posts

Recent Comments