The Data Seal of Approval board hereby confirms that the Trusted Digital repository IDS Repository complies with the guidelines version 2010 set by the Data Seal of Approval Board.
The afore-mentioned repository has therefore acquired the Data Seal of Approval of 2010 on April 9, 2013.
The Trusted Digital repository is allowed to place an image of the Data Seal of Approval logo corresponding to the guidelines version date on their website. This image must link to this file which is hosted on the Data Seal of Approval website.
The Data Seal of Approval Board
|Guidelines Version:||2010 | June 1, 2010|
|Guidelines Information Booklet:||DSA-booklet_2010.pdf|
|All Guidelines Documentation:||Documentation|
|Seal Acquiry Date:||Apr. 09, 2013|
|For the latest version of the awarded DSA |
for this repository please visit our website:
|Previously Acquired Seals:||
|This repository is owned by:||
(a) Metadata: Every resource must be provided in a standardized format or an exhaustive documentation of the proprietary format. At least for the whole resource, a minimum set of Metadata in Dublin Core (DC:title, DC:description, DC:publisher and/or DC:creator, DC:legalStatus) must be provided. Moreover, comprehensive documentation describing - depending on the resource - provenance of data, procedure of curation, necessary tools, formats, and a bibliography of publications about the resource, must be provided.
If the resource consists of several parts, for example a collection of papers, provision of metadata for the individual parts in appropriate form is strongly encouraged. Ideally, these metadata are provided in CMDI, but other forms such as well documented comma separated tables, from which CMDI metadata can be generated are accepted.
(b) Quality Assurance: Only resources that comply with CLARIN guidelines or are created in peer-reviewed scientific projects (with respect to scientific and scholarly quality) are considered for deposit. The depositor is required to sign an agreement stating that these guidelines are met (see also DSA Guideline 5).
Data sharing and reuse is promoted by providing free access to the data (download, webservices) and metadata (via the OAI-PMH protocol). The CLARIN infrastructure contains software components such as the VLO (http://www.clarin.eu/vlo/) which enable users to browse and search through combined catalogs that contain metadata of all CLARIN repositories.
The IDS Repository recommends to use formats listed in the CLARIN standard recommendations (http://www.clarin.eu/recommendations). The encoding for textual sources (plain text, XML, etc.) should be Unicode. In addition, for spoken corpora, the following formats are currently accepted.
The FOLKER data format (Documentation in German, XML Schema)
The EXMARaLDA data format (Documentation , DTDs)
For other formats we offer advice for conversion. However, as a general principle we also archive digital data in their original format in order to minimize the risk of conversion loss.
Following general CLARIN standards, metadata for the IDS Repository must be provided in the CMDI format with unique references to the actual resources. Comprehensive documentation (http://www.clarin.eu/cmdi) on how to create CMDI compliant metadata profiles and instances is available at http://www.clarin.eu/cmdi.
The creation of metadata files (instances) can be performed with any standard XML Editor, e.g. the XML Editor ARBIL (https://www.clarin.eu/faq/technical-infrastructure/standards/metadata/arbil-as-cmdi-editor) that comes with CMDI support. Additionally, a set of tools is provided that allow data producers to create new or adapt existing metadata to the CMDI standard. This includes customizable transformation scripts for converting existing metadata in a variety of formats (Dublin Core, generic XML, comma separated tables) to CMDI, and extracting metadata from text data.
The granularity of CMDI metadata and objects is chosen by the (meta)data producer. The IDS Repository itself is able to handle a high granularity of metadata and objects.
Metadata elements must be compliant to the standards set in CMDI. Since CMDI is a component based approach which allows (meta)data producers to create custom tailored metadata profiles there is no limit to the usage of established standards etc. In order to be visible and useable in the CLARIN infrastructure CMDI metadata added to the IDS Repository needs to contain a minimum set of attributes (linked to data categories stored in the ISOcat) which is enforced by the quality checks as part of the automated ingest and delivery procedures of the IDS Repository.
1. Validation against CMDI Schemas before the ingest.
2. Integrity check for all referenced data.
3. Generation of an actionable URL for all CMDI records and data, and registration of the URL in a handle system (http://hdl.handle.net/).
4. Validation based on the validation procedures of the underlying Fedora-Commons backend.
5. Validation of CMDI Records delivered by the OAI Provider, using the underlying validation of the Fedora-Commons PROAI provider.
The URL: https://www.clarin.eu/faq/technical-infrastructure/standards/metadata/arbil-as-cmdi-editor creates an error message. Can you please check it and update the URL if needed?
The mission of the IDS Repository is to serve as the repository of a CLARIN-D resource center. The mission of CLARIN-D is to provide “linguistic data, tools and services in an integrated, interoperable and scalable infrastructure for the social sciences and humanities“ (http://de.clarin.eu/en/home-en.html). Therefore a repository in which data, tools and according metadata is archived on a long term basis has to be operated by such a resource center.
This mission is in line with the general mission of IDS-Mannheim (Satzung des Instituts für Deutsche Sprache, &2(1) ), which states: "The foundation pursues the purpose of scientifically researching and documenting the German language in its contemporary use and its more recent history. It cooperates with other national and international institutions with a similar goal, and also provides scientific services."
The IDS Repository is part of the CLARIN infrastructure and thus does not carry out promotional activities on its own, but is embedded into such activities on CLARIN-D and the European CLARIN level. These activities include but are not limited to:
- Providing comprehensive information on the CLARIN mission through websites (clarin.eu, de.clarin.eu).
- Operation and maintenance of the Virtual Language Observatory (VLO) which provides means to search for data/tools to the end user (based on the metadata provided by the resource centers/repositories that are part of CLARIN).
- Presenting data, tools and services provided by CLARIN on conferences.
- Organization of dissemination conferences that aim at getting in touch with the user communities of CLARIN.
- Organization of training courses.
The IDS Repository is not a legal entity on its own. It is run by IDS-Mannheim which is an institution governed by public law.
Depositors must sign an agreement stating that they respect IPR (Intellectual Property Rights) and privacy issues and that they own all necessary rights required to deposit the data. In particular, data must be anonymized when applicable. Users must confirm that they will use resources only in the intended way. The depositor can choose to make the data publicly available, restrict access to academics via AAI (Authentication and Authorization Infrastructure), or to restrict access to individual users.
Examples for the declaration of consent of interviewees in the FOLK Corpus are available:
Declaration of Consent FOLK audio recordings (in German) 
Declaration of Consent FOLK video recordings (in German) 
 http://repos.ids-mannheim.de/tou.html (visited: April 8, 2013)
The IDS Repository runs on a virtual server hosted by the IDS-Mannheim. Maintenance of the virtual server is performed by a team of trained personnel. Access to the virtual server is restricted by a firewall. The storage hardware and hardware for virtual machines is replaced at regular intervals to the latest state of art.
The IDS Repository, that is data and operating system, is backed up Monday trough Thursday with incremental backups. Full (4th) respectively differential (1st, 2nd, 3rd, 5th) backups are performed every fourth Friday. Backups have a retention period of three months and are stored on a dedicated backup server on disks.
In the future, the IDS anticipates to keep a mirror of the most valuable data with a 3rd-party (Mannheim University), but legal, technical, and financial issues still need to be settled.
The IDS Repository virtual machine, the backup server and other critical infrastructure is monitored with Icinga (= network and service monitoring software).
Measures are taken to enhance the chance of future interpretability of the data. The number of accepted file formats is detailed in DSA Guideline 2, to make future conversions to other formats more feasible. As much as possible open (non-proprietary) file formats are used. For textual resources, XML formats are used whenever possible, to make future interpretation of the files possible even if the tool that was used to create them no longer exists. Text is encoded in Unicode to ensure future interpretability.
When a particular file format is in danger of becoming obsolete, appropriate curation steps take place.
All resources in the IDS Repository (metadata and actual data) are equipped with a checksum, which is checked on a regular basis in coordination with the backup schedule described in DSA Guideline 6.
Technical Workflows: The IDS Repository uses Fedora Commons as an underlying repository system. The ingest workflows of the IDS Repository are built on top of the batch ingest utilities provided by Fedora Commons. As detailed in DSA 3, extensive technical validation and automated curation takes place for ingesting CMDI metadata and the underlying data.
Overall Workflows: The general goal of the IDS Repository is to sustainably archive linguistic resources (corpora and tools) compiled and developed at IDS-Mannheim together with their metadata. In addition, IDS-Mannheim aims at providing archival services to academic researchers and institutions according to its basic mission (Satzung des Instituts für Deutsche Sprache, &2(1)) . Selection of resources, and decision about archival is governed by institutional best practices, balancing provenance, utility, and funding.
The data provider retains all intellectual property rights to their data. The depositor must grant distribution rights to the IDS Repository and choose an access model (public, academic, individuals). Access models are provided by the repository and distribution rights are specified in the data provision contract. Enforcing licenses by data users in the case of misuse is conducted by the property rights owner.
Crisis management is based on the technical solutions described in DSA Guideline 6. In addition, the IDS Repository archives all metadata and data in such a way that they can be easily migrated to and mirrored at other CLARIN resource centers. All metadata and data have a persistent identifier (PID), and are stored as self contained XML files. Legal aspects of the process of relocating data to another institution is addressed by templates of license agreements provided in CLARIN.
Harvesting of metadata is possible via OAI-PMH. Local search facilities are provided on the basis of the search interface of Fedora Commons (http://repos.ids-mannheim.de/fedora/objects). In addition, all CMDI metadata are harvested by the OAI-PMH of the virtual language observatory (VLO: http://www.clarin.eu/vlo/), which provide a central starting point when searching for resources in the CLARIN infrastructure. For some resources “deep search” is supported by the means of the CLARIN Federated Content Search (http://www.clarin.eu/fcs) interface.
The IDS has acquired a Handle prefix and runs an own Handle server for persistent identifiers. The IDS anticipates to have their prefix mirrored by EPIC. and is currently negotiating this issue with EPIC. The IDS Repository itself does not offer a persistent identifier service on its own but relies on the IDS Handle server. The usage of PIDs is mandatory for resources and their CMDI metadata in CLARIN thus all resources added to the repository can be referenced using PIDs.
The integrity of the data is ensured by the version control in the Fedora-Commons backend. Metadata is a data stream within the digital object, and as such is version controlled like object data. CLARIN propagates the idea of reproducible research. Thus updates/new versions of resources typically are equipped with a new PID. Only marginal changes to CMDI metadata are versioned without registering a new PID.
Part of the archiving workflow is the integrity check of the data and the metadata by the archive manager. This is done both manually and automatically. The metadata is parsed for syntactic correctness and manually evaluated for completeness and soundness. The object data is tested for syntactic correctness if possible. All datastreams and versions are equipped with a MD5 checksum, which is checked in coordination with the backups as described in DSA Guideline 6.
The repository in principle makes the original deposited objects available in an unmodified way, if the objects are in one of the accepted file types and encodings. In case of changes by the data producer, the repository creates a new digital object with a new PID. In the case that the repository has to change the data, e.g., because a file format becomes obsolete and superceeded, the original data are kept.
The repository only accepts works from the original data producers, who are acknowledged as such by means of the or elements in Dublin Core, or equivalent elements with according ISOCAT categories in CMDI. We use CMDI relations (depending on the profile) to link between objects within a collection, and providing links from objects to additional information. An example CMDI record for the "Mannheimer Korpus historischer Zeitungen und Zeitschriften" is available at: http://hdl.handle.net/10932/00-017B-E0F5-4DD7-4D01-F.
External deposits are only accepted after a due dilligence process involving a check of the identity of depositors and clarification of all legall issues along the lines described in DSA Guideline 5.
The repository complies with the OAIS reference model’s tasks and functions. Moreover, the repository uses the Fedora Commons software, which is compliant with the Reference Model for an Open Archival Information System (OAIS) due to its ability to ingest and disseminate Submission Information Packages (SIPS) and Dissemination Information Packages (DIPS) in standard container formats.
The data consumer has direct access to the archived objects via the web, provided that access requirements have been met.
A more detailed description of the IDS Repository Functional Architecture along the OAIS reference model and Ingest Pipelines is available in .
 Reference Model for an Open Archival Information System (OAIS), Recommended Practice, CCSDS 650.0-M-2 (Magenta Book) Issue 2, June 2012 http://public.ccsds.org/publications/archive/650x0m2.pdf
 Functional Architecture and Ingest Pipelines. http://repos.ids-mannheim.de/reposdescription.html
All CMDI metadata are provided without access restrictions according to CLARIN-D policies.
Part of the actual data is also provided without access restrictions, but a significant part is protected. For some data, a shibboleth account is necessary, for some data a personal account is necessary to get access to the data. For some data sets, explicit permission from the depositor is needed. For a large part of the data, the data consumer needs to agree with a code of conduct, which also contains licensing terms.
An example of a protected resource is DeReKo . Access to a large part of the actual data of DeReKo is only possible via COSMAS II (http://cosmas2.ids-mannheim.de/). In order to access DeReKo via COSMAS II an end user license agreement has to be signed (http://www.ids-mannheim.de/cosmas2/projekt/registrierung/). For some sub-corpora of DeReKo the access is further restricted to IDS-internal use only (see http://www.ids-mannheim.de/kl/projekte/korpora/archiv.html for a list).
Some smaller parts of DeReKo are also available for download. These are licensed under Creative Commons (CC-BY-SA), namely the Wikipedia corpora (wpd, wpd11, wdd11) and the corpus "Reden und Interviews" (rei) (see http://www.ids-mannheim.de/kl/projekte/korpora/verfuegbarkeit.html#Download). Further corpora that are available for download (mk1, mk2, bzk) are under a special license that allows for non-commericial scientific use only and prohibits their re-distribution.
Data depositors need to make sure that IPR and personality rights are respected in their deposited data. They specify an appropriate licence that data consumer need to accept. Data are protected by an AAI and only available when accepting the licence.
In addition, the IDS Repository requires data consumers to comply with the DFG code of conduct for good scientific practice .
Just out of curiosity: Have you implemented a means to check if consumers really comply with the DFG code of conduct?
The system does not allow ingest of data into the repository without the specification of access criteria and without providing an appropriate licence. These license conditions are displayed to users in the CMDI metadata and must be accepted before obtaining access to the data.