The CoreTrustSeal board hereby confirms that the Trusted Digital repository CLARIN Center IvdNT complies with the guidelines version 2017-2019 set by the CoreTrustSeal Board.
The afore-mentioned repository has therefore acquired the CoreTrustSeal of 2016 on December 10, 2018.
The Trusted Digital repository is allowed to place an image of the CoreTrustSeal logo corresponding to the guidelines version date on their website. This image must link to this file which is hosted on the CoreTrustSeal website.
The CoreTrustSeal Board
|Guidelines Version:||2017-2019 | November 10, 2016|
|Guidelines Information Booklet:||DSA-booklet_2017-2019.pdf|
|All Guidelines Documentation:||Documentation|
|Repository:||CLARIN Center IvdNT|
|Seal Acquiry Date:||Dec. 10, 2018|
|For the latest version of the awarded DSA |
for this repository please visit our website:
|Previously Acquired Seals:||None|
|This repository is owned by:||
Please note: the Instituut voor de Nederlandse Taal / Dutch Language Institute (IvdNT) was formerly known as the Instituut voor Nederlandse Lexicologie / Institute for Dutch Lexicology (INL).
The Instituut voor de Nederlandse Taal (Dutch Language Institute; IvdNT) is a binational institute, structurally funded by the Dutch and Flemish (Belgian) governments. It applies for the CoreTrustSeal as a CLARIN B Centre within the European CLARIN ERIC. The institute is actively improving its CLARIN services through the Dutch programme CLARIAH. References:
Main website: https://portal.clarin.inl.nl/
The institute collects and enriches Dutch linguistic material, and makes the material available both through software applications and as downloads. The institute also accepts Dutch linguistic material collected and enriched by other institutes and universities.
The primary goal of the institute as a CLARIN B Centre is to make data available for researchers through applications. Download services are available for a wider public as well. Furthermore, the institute is tasked with archiving and maintaining the deposited materials.
The data sets are supplied with all the information that is essential for sustainable data management and future use. Data producers are encouraged to supply additional data description documents or links to publications (using persistent identifiers) about the data. The publications are stored in the repository (given that permissions are granted), while the data description documents are archived in the OAI/PMH accessible data repository and accessible through the tools and applications.
Repository type. The institute collects and accepts Dutch linguistic material (including multilinguality) worldwide, often in direct relation to research projects.
Repository’s designated community. The institute aims to serve anyone interested for whatever reason in the Dutch language. There is obviously a particular focus on researchers from the humanities and social sciences.
Level of Curation Performed. Our standard is to do A and B. C and D will only be done when additional funding is available.
Outsource partners. Most of the infrastructure is managed by the institute itself. The servers are in a data centre of the University of Leiden. The institute has a contract with the IT department of the university (ISSC) to provide location services like firewall, switching and cables. The institute has a long-standing relation with Centric IT Solutions and its partners, for consultancy and sometimes implementation. Centric IT Solutions provides services to our institute on basis of incidents. An important partner is BackupNed, that takes care of off-site storage of our back-up tapes in a bunker.
The data and applications are made available through virtual machines that can easily be transferred to another CLARIN B Centre. References:
Other relevant information. For many years the institute hosted a large amount of linguistic material through the project TST-Centrale / HLT-Agency: the Dutch-Flemish Human Language Technology Agency. It was transferred temporarily to our partner Nederlandse Taalunie / Dutch Language Union (NTU), but returned in 2016. These materials have not yet been ingested in the CLARIN B Centre, but will in the future. Presently these materials are available through a download service. The CLARIN B Centre however already uses the expertise of the HLT-Agency. References:
The IvdNT is funded by the governments of both Flanders and the Netherlands on a permanent basis.
The Institute is an acknowledged CLARIN B Centre and fully participates in the CLARIN ERIC. The mission statement is described in the document “Information about deposition” in our CLARIN portal. In the policy plan for the years 2013-2017 (section 3.1.5) the IvdNT states: “The CLARIN Centre, which is being built at the IvdNT within the framework of the European, Dutch and Flemish CLARIN projects, and for which cooperation with Sara and BigGrid is also involved, is structurally part of the IvdNT infrastructure. Through this CLARIN Centre, the IvdNT becomes one of the nodes in a European network that serves both linguistics and humanities and society in general. Users of that network get low-threshold access to language data, tools, and other services wherever they are and wherever the materials they use are in the network”. The CLARIN Centre is integrated with the institute and uses all of its facilities: website, newsletters, Twitter, and so on. References:
CLARIN B Centre / CLARIN ERIC: see above.
The repository is not a legal entity on its own, but is part of the Instituut voor de Nederlandse Taal (Dutch Language Institute), which is a legal entity, an institute according to Dutch law. It is controlled by a Supervisory Board. As umbrella organization functions the Dutch Language Union. Both the institute and the Dutch Language Union are under the supervision of the Committee of the (relevant) Ministers of the Netherlands and Flanders.
If data is owned by third parties, like publishers, a project agreement makes explicit all obligations and restrictions on use.
Licenses are dealt with in detail in the document “End User License Agreement” in our CLARIN portal. In the metadata of individual datasets additional licenses can be specified (e.g. GNU, Creative Commons, etc.).
The data producer, i.e. the depositor will always remain the proprietor. IvdNT does in fact get a copy of the data of which it must take good care, according to the terms of the license contract and the terms and conditions for use. IvdNT also makes copies, for example for the benefit of back-up, and looks after them well.
In case of an emergency we are able to build up an entirely new database composed of all files we backed up and stored safely at another location.
License agreements or contracts with data providers vary quite a lot because of their different nature (e.g. commercial publishers, publicly funded research) and are not made public. These contract mainly specify the conditions under which the INT can make the data available to third parties (our users). Examples can be sent separately to the reviewer, if he or she so wishes:
- Contract between the Flemish Publishing Company and IvdNT, concerning the delivery and use of digital material from the Flemish newspaper De Standaard.
- Contract between PCM Publishers and IvdNT, concerning the delivery and use of digital material from the Dutch newspaper NRC Handelsblad.
- License agreement between IvdNT and a second institute, concerning the consultation and use of the product Brieven als Buit/Letters as loot.
Moreover, we have available a template for agreements with depositors, which is in Dutch. We will make that available through the CoreTrustSeal-Secretary.
All Open Access data stored at IvdNT is freely accessible. Some data is restricted to Academic Use.
The access to the data is restricted in several ways. If data is only accessible for the CLARIN research community, we use the authorization and authentication mechanism of the CLARIN Service Provider Federation. Sometimes data has open access, but is still regulated through web applications which allow querying for useful, but limited amounts of data.
Data are stored in a state-of-the art LAN-DMZ set-up, according to best practices, including IvdNT’s own system of authentication and authorization (e.g. Active Directory).
Our licenses are subject to Dutch law.
As stated before, data is made available (searchable) primarily through web applications. Distribution of the data as such takes place, but is exceptional and is restricted to historical material without any privacy issues. It is realized by the standing IvdNT organization, under the supervision of management that has received instructions from experts in consortia such as CLARIN-NL and from legal experts. Compliance to national and international laws is checked periodically by the standing organization.
We do not specify penalties for non-compliance of end users. Our actions in such cases will depend very much on the type of tresspassing, but will most likely involve demands to cease and desist the perpetrating acts.
The institute is financed by the Flemisch and Dutch governments on a permanent basis. In case of a dissolvement of the institute, the board of directors selects an entity which will receive all assets of the intsitute.
The financial situation is reviewed regularly and continuously cover a period of five years, for example 2017-2021. The financial situation is checked every year by an accountant. Work on the policy plan for the period 2018-2022 is ongoing.
The infrastructure has been set up in such a way (e.g. using vm's) that services can be transfered to another CLARIN B Centre with a minimum of effort. The technical details are described below.
The CLARIN B Centre IvdNT ingests data that was collected within the CLARIN ERIC and CLARIAH frameworks. Both frameworks have extensive governance, also explicitly addressing ethical and legal issues. The CLARIN ERIC, for example, has a Legal Issues Committee. The CLARIN B Centre follows the guidelines and recommendations of the CLARIN Legal Issues Committee. If access needs to be restricted to academic use, the standard CLARIN federated login will be required. The CLARIN B Centre is embedded in the IvdNT.
The IvdNT is a recognized research institute and as such adheres to the Netherlands Code of Conduct for Academic Practice. When the IvdNT collects linguistic material itself, it uses a Standard Operating Procedures based on these Codes of Conduct. Moreover, the accountant of the IvdNT does not only carry out financial checks, but pays explicit attention to the quality of the workflow as well, Suggestions for improvements are delivered in a yearly accountant report.
Moreover, the CLARIN B Centre and the IvdNT as a whole have implemented the rules from the GDPR regulations. Two Data Protection Officers have been installed who monitor all data handling that might involve personal data.
The CLARIN B Centre is embedded in the IvdNT. Its mission and funding have already been discussed. The IvdNT and therefore the CLARIN B Centre have an abundance of trained staff: linguists, computational linguists, software developers, a systems architect and systems developers. The funding covers frequent participation of many staff members in relevant meetings and seminars. The quality and training of staff is a spearhead of the policy of the director of the IvdNT. The IvdNT has an Advisory Board, consisting of persons with extensive knowledge and experience of the work areas of the foundation. Members of the Advisory Council are nominated by the Supervisory Board and appointed by the Committee of Ministers of the Language Union. The Advisory Board gives asked and unsolicited advice to the Board and the Committee of Ministers of the Language Union on issues relating to the realization of the Foundation's objectives. References:
We have already pointed out the importance of the CLARIN ERIC / CLARIAH context and the Advisory Board. CLARIN committees regularly provide relevant documents covering many issues. The CLARIN B Centre coordinator is also the National Coordinator for Flanders in the CLARIN ERIC, meeting his European colleagues every month, either through video conferencing or face-to-face. The coordinator is also a member of the CLARIAH coordinating teams. The linguistic coordinating team has a meeting every month. For the Netherlands the CLARIN B Centre uses the National Coordinator for the Netherlands in the CLARIN ERIC as linking pin. The CLARIN B Centre communicates directly with the researchers in Flanders, primarily using email. All of this is supplemented by regular on-site activity by crucial staff members.
Presently user feedback is only possible through email with our service desk 'email@example.com'.
To ensure the integrity of the data sets, for every deposited file a checksum (md5 type) is made which allows us to check the files for defects in later years. Once deposited, files in data sets are never changed and only minor changes to the metadata are allowed. For example: correction of spelling, minor changes in documentation, additional documentation added, etc. Changes to the data themselves will be issued as a new version of the dataset and will obtain a new persistent identifier. These changes are only made in narrow collaboration with the producer of the dataset.
The repository maintains links to other relevant materials (e.g. article, thesis, documentation, data elsewhere) and to metadata of measuring instruments (AWStats etc.) whenever applicable. The metadata records provenance information on the data like creator, original sources, date of creation, etc.
The unique identity of a depositor is ensured either through personal contact or by the required login using the CLARIN Service Provider Federation for identification. References:
The collection development policy is quite broad with respect to purpose: all Dutch language materials that might be of interest to resarchers of any discipline.
As to the quality of the data sets, these must be clear evidence of usefullness reflected by properties like completeness, size, structure, annotation, etc. We do not have a clear set of rules to measure quality.
The data sets need to be supplied with all the information that is essential for sustainable data management and future use. Data producers are encouraged to supply additional data description documents or links to publications (using persistent identifiers) about the data.
CMDI descriptions are created and maintained by repository staff using a number of standard components sufficient to assure discoverability.
We have a policy to provide data as much as possible in open standards. The data from other producers is accepted if it complies to these standards. It will be rejected if it does not comply.
CMDI profiles used are published here: http://catalog.clarin.eu/ds/ComponentRegistry/#
The management is taken care of by the IT staff of the IvdNT and takes place according to extensive documentation and numerous Standard Operating Procedures and checklists, stored in an internal wiki.
The procedures include:
Weekly full back-ups to tape. The full back-ups are followed by incremental back-ups on a daily basis.
Quarterly full back-ups to tape, to be stored at another location (Backupned).
These tapes will eventually be reused, but one set of each year is retained for at least seven years. A restore can be carried out upon request. Essential information for disaster recovery is stored on paper and digitally in a vault at a great distance from the data centre.
Installation of security patches and updates on a monthly basis.
Daily and automated monitoring of systems and applications.
The IvdNT internal Wiki contains a vast number of documents, concerning both ongoing projects and project results (applications, data). These concern CLARIN as an umbrella, software development and system management. The documents consist for the most part of documentation, standard operating procedures, checklists and best practices and form a very detailed preservation policy. This policy is so much taylored to the situation of the IvdNT that it publishing it is not considered useful. Internal IvdNT standards are applied also to external data and applications (metadata, security etc.). The information in the internal wiki is confidential. The tables of contents of the most relevant sections can be sent separately to the reviewer if he or she wishes.
As much as possible open file formats are used. Small volume conversions due to obsolescence of file formats will be handled. For textual resources, XML formats are used whenever possible, to make future interpretation of the files possible, even if the tool that was used to create them no longer exists.
The IvdNT has the capability to convert when needed and follows the guidelines published by CCSDS, in the Reference Model for an Open Archival Information System (OAIS).
The OAIS presents a functional model consisting of six functional entities. A number of interactions are possible between those entities. We will present a description of these entities within the IvdNT CLARIN Centre.
1. Ingest. This entity receives data from producers. Special tasks are: receiving data, performing quality assurance, checks on documentation, description and formats. Establish metadata and prepare for archiving and data management. Implications for IvdNT CLARIN Centre: there is a Standard Operating Procedure for ingest of data (acquisition) which includes all the tasks mentioned.
2. Archival Storage. This entity is responsible for the systematic storage, maintenance and retrieval of the data. It further performs routine checks on media quality (refresh if necessary), errors and disaster recovery capabilities. Implications for IvdNT CLARIN Centre: the IvdNT CLARIN Centre distinguishes two separate functions. First, data management, which is responsible for storage of the data, error detection and retrieval. Second, system management, which is responsible for media quality and recoverability.
3. Data Management. This entity is responsible for content integrity of the data. It sees that data and descriptive information is connected and is responsible for version management. Implications for IvdNT CLARIN Centre: these responsibilities are part of the previously mentioned data management function.
4. Administration. Oversees all archiving operations. Negotiates submission agreements with Producers. Establishes policies for maintenance, standards and hardware and software planning, customer support, etc. Implications for IvdNT CLARIN Centre: the coordinator of IvdNT CLARIN Centre, together with management of IvdNT, is responsible for developing these policies.
5. Preservation planning. Evaluates the quality of the content and the quality of the service in context of the user community. Signals developments in technology and use patterns and provides policies to upgrade the archive service accordingly. Also provides migration planning. Implications for IvdNT CLARIN Center: the IvdNT collaborates in projects with many national and international parties. The services of the IvdNT CLARIN Centre are continuously updated according to the needs of those collaborations. Furthermore, the IvdNT CLARIN Centre participates in spearheading projects like CLARIAH that intend to provide a digital infrastructure for research. We do not have a comprehensive Preservation Plan. But we have a number of SOP's that together might be considered a surrogate. The SOP's are internal documents and in Dutch. Relevant SOP's are e.g. 'Periodical Maintenance Check', 'Ingest Data', 'Ingest Web apps', 'Archiving', 'Distribution'. Through contact with our user base we acquire information on how to improve our data sets and services. At the request of one of the reviewers, we submit the most relevant internal document, together with an explanation in English through the administration of CoreTrustSeal.org.
6. Access. This entity is responsible for the interaction with consumers. It provides information about the available products and is responsible for communication with consumers. Implications for IvdNT CLARIN Centre: information about products is disseminated through a number of portals like OLAC, ELRA Universal Catalogue and CLARIN. This information points interested parties to our products. Questions can be directed to the service desk.
For converting images and video we make an appeal to archives which are specialized in these kind of resources. References:
Trained repository staff checks the completeness of documentation provided by the depositor. Metadata and a detailed description of the data is produced in concert with the depositor. This is possible because of the nature of repository not too many deposits are made per year. The Repository Servicedesk stays in contact with the depositor in order to answer questions from users. Product pages display all available information on the data. The Repository Servicedesk collects feedback from users and provides updates of the data if necessary.
The ingest procedures in use by data producers and staff are documented. Other processes, due to long-term preservation (data conversions, version control...) still need to be described in detail and are currently in their draft form.
As stated above, data is made available both through tools and applications on the web and as download, relevant to the scientific domain of the IvdNT. Periodic tests of the software and therefore data take place, typically when a new version of a browser is released.
Research data is currently available in formats suggested, required and frequently used by the data consumers or in formats in which IvdNT has the highest confidence with regard to sustainability.
The resource browser (IvdNT CLARIN portal) allows access to all data collections available.
All visible metadata and some metadata extracted from data files are indexed and searchable. Queries can contain special operators, fieldnames, wildcards etc. and results can be refined using facets by the user.
All objects maintained by the IvdNT CLARIN Centre are assigned a unique, persistent identifier with prefix 10032.
All metadata can be harvested via the OAI-PMH protocol. References:
As much as possible open file formats are used. Small volume conversions due to obsolescence of file formats will be handled. For textual resources, XML formats are used whenever possible, to make future interpretation of the files possible even if the tool that was used to create them no longer exists.
The IvdNT has the capability to convert when needed and follows the guidelines published by CCSDS (‘http://public.ccsds.org/publications/MagentaBooks.aspx’), in the ‘Reference Model for an Open Archival Information System (OAIS)’ (‘CCSDS 650.0-M-2’).
We already mentioned the Component MetaData Infrastructure (CMDI). References:
We heve developed SOP's for ingest, preservation, management, archiving and distribution of data which have been based to a large extend on the OAIS Reference Model. We consider our product catalogue and the frequency of depositions too small for a full implementation of that model.
The core infrastructure has been developed with the keywords “state-of-the-art” and “proven technology”.
The office automation is based on Microsoft solutions, the virtualization on VMWare in conjunction with NetApp hardware. HP server and storage solutions are used. The environment for research and the launching of applications is based on Linux. For the back-ups Veritas is used.
When open source, community-supported software conforms to the keywords mentioned above, we will use it (for example monitoring software such as Cacti and Nagios, for federated identity solutions Shibboleth, and OAI for sharing of metadata).
An overview with documentation is stored in the internal wiki.
At present the IvdNT is in the middle of a migration to the latest Windows server versions and CentOS 7. We request advice by Centric for vital migrations like these (regularly) and for the acquisition of new hardware (always).
The network is an integral part of the network of Leiden University.
There is hardware redundancy to prevent technical failures:
All hardware is equipped with a double power supply connected to two different power suppliers (companies) provided by the hosting organisation (Leiden University).
Power continuity is provided by double power suppliers and emergency aggregates at full blackout.
Spindlesets with operating system data are RAID1 configured.
Spindlesets with other data are RAID50 configured with one hot standby for every five spindles.
Backup and Recovery
Disaster recovery backups are made every week at virtual machine level. Emergency recovery pertains to building a replacing VMWare infrastructure with the backup software and tapedrives and restoring complete VM’s from tape. This last step is tested once every year.
A copy of disaster recovery data and long term storage are stored offsite.
Back-ups to prevent procedural corruption (human error) are made daily and kept in a tiered system for 3, 12 and 84 months.
Outside security relies on scans by the security department of Leiden University as hosting provider. They also maintain our outside firewall.
Inside security relies on the virtual segmented network with firewalls in between.
Virus threats are mitigated by TrendMicro desktop software and clamd mail-scan software.
Internal access requires Active Directory authentication, authorization and accounting. Access to remote desktop working is secured by a token based two-factor authentication system.
Security logs are stored off-server.
The security officer role is part of our system administrator function. No specific risk analysis tools are used.