The Data Seal of Approval board hereby confirms that the Trusted Digital repository Mendeley Data complies with the guidelines version 2017-2019 set by the Data Seal of Approval Board.
The afore-mentioned repository has therefore acquired the Data Seal of Approval of 2016 on June 22, 2017.
The Trusted Digital repository is allowed to place an image of the Data Seal of Approval logo corresponding to the guidelines version date on their website. This image must link to this file which is hosted on the Data Seal of Approval website.
The Data Seal of Approval Board
|Guidelines Version:||2017-2019 | November 10, 2016|
|Guidelines Information Booklet:||DSA-booklet_2017-2019.pdf|
|All Guidelines Documentation:||Documentation|
|Seal Acquiry Date:||Jun. 22, 2017|
|For the latest version of the awarded DSA |
for this repository please visit our website:
|Previously Acquired Seals:||None|
|This repository is owned by:||
1) Repository Type:
Other - Mendeley Data is a generalist open research data repository, applicable to all areas of science. All file formats and types may be uploaded, although, as indicated in further detail in the application, certain file types more suitable for preservation are preferred. Mendeley Data aims to openly interoperate with other repositories and systems, via implementation of open standards and a public API.
2) Brief Description of the Repository’s Designated Community:
Mendeley Data’s Designated Community is similarly broad, as it may include researchers in any field of science. We aim to support the data needs of both institutional researchers, and authors submitting papers for publication.
3) Level of Curation Performed:
B. Every published dataset is reviewed by a trained Mendeley Data reviewer, to verify it is scientific in nature, does not constitute a previously published article, and as far as possible that it does not contain personally identifiable information (as described in more detail in Guideline 4). In cases where the dataset constitutes valid research data but metadata is insufficient or incomplete, feedback will be provided to the author to encourage them to provide additional descriptive information. As the repository is a generalist one, we cannot provide feedback on or curation of the scientific content of datasets. Additionally, it is at the author's discretion to provide additional description in response to our feedback - we don't currently enforce this.
4) Outsource partners:
Mendeley Data is a repository launched by Mendeley (part of Elsevier) in 2015, which has an explicit mission to store and preserve data deposited to our repository, and to make the data accessible to Data Consumers for the long-term. Our mission is set out here.
The mission has received backing from senior internal stakeholders, including the Managing Director of Elsevier Research Products.
Research Data Management is a long-term strategic priority for Elsevier. Elsevier has been active in this space since initiating its Data Linking and Research Elements programmes in 2013. Following these successes, a strategic project was started in 2014 that resulted in the pilot project for Mendeley Data in 2015. Since the full launch in 2016 Mendeley Data is now part of the 5 year operation plan that has been accepted by the Elsevier Board May 2016.
We communicate our commitment to preserve datasets deposited to the repository, to our end users, on our homepage, within FAQs, our mission statement, and our archiving processes page. For more information on our approach to preservation, see R10.
To ensure the long-term archiving of datasets posted to the repository, Elsevier has established a non-exclusive archival service provider relationship with Data Archiving and Networked Services (DANS), of the Netherlands.
When posting a dataset to Mendeley Data, Data Producers must select a licence governing access and usage, from a set of licence options, which include Creative Commons licences, software licences and hardware licences.
It is not possible to submit a dataset without selecting a licence, therefore all datasets have a licence. In order to support the Data Producer’s licence selection process, we provide explanatory text for each licence, and a link to the relevant licence provider’s webpage for full information.
The licences we currently make available to Data Producers are:
These licences govern the usage, distribution and author attribution requirements of datasets.
The licence selected by the Data Producer is displayed on the public dataset page, along with a description of the licence’s requirements. Therefore Data Consumers accept that they will follow the licence requirements, when they use the data.
If an instance of non-compliance with a licence by a Data Consumer is discovered, then the repository will contact the Data Producer to notify them:
Usage of the repository, either as a Data Producer or a Data Consumer, is also governed by our Terms. See requirement R4 for more detail.
Mendeley Data commits to ensuring long-term availability, access to and preservation of datasets submitted to the Repository, as set out in our Mission statement.
The Repository has a contract with data archive and preservation service DANS-KNAW (DANS), as an archival service provider, for long-term digital archiving for scholarly datasets, in which DANS-KNAW undertakes to provide for “the long-term storage and maintenance of the Datasets delivered … and their preservation in a form that will provide security as to data integrity and usability”.
See more on preservation in R10.
In the event of cessation of funding, ongoing access to and preservation to datasets is guaranteed by DANS.
DANS is contracted, in the event that “Elsevier terminates the research data service” or if “Elsevier becomes insolvent, etc”, to “take over the hosting and provision of the ingested Datasets to be made available to the public”.
The provision of datasets to the public entails:
As far as possible, Mendeley Data aims to ensure datasets posted to and made available by the repository comply with confidentiality and ethics guidelines.
Terms which govern the usage of the repository, a data producer or data consumer, are published here.
The Terms require (4.4.7) that authors "have obtained all necessary consents”; and that “data is suitably anonymized wherever appropriate". This means that data with disclosure risk must not be shared, except where consent has been given, or should be anonymised.
In our FAQs ("What are the requirements for Mendeley Data datasets?"), we provide guidance to Data Producers, that datasets must not “contain sensitive information (for example, but not limited to: patient details, dates of birth etc.)”
Every published dataset is reviewed by a trained Mendeley Data reviewer. Any datasets found to contain confidential or personally identifiable information (PII), will be taken down per our review and takedown process. We provide a “Notify” mechanism on every dataset page, such that Data Consumers can notify us if inappropriate information is found.
Every Mendeley Data reviewer is trained to examine datasets to establish, to the best extent possible, that the dataset does not contain copyrighted or previously published article papers; is scientific in nature; and does not contain personally identifiable information. The legal responsibility to ensure the latter, per our Terms, is with the Data Producer.
Future features are planned to further increase our support for depositing confidential information securely:
Once a dataset has passed review, it is subsequently forwarded to DANS archive.
The DANS review process checks for confidential information as follows: “[data are] processed by a staff member at DANS in accordance with a standard data processing protocol. … On the basis of this protocol, the following types of verification have been performed since the introduction of DANS in 2005:
…Verification of the presence of privacy-sensitive data, both in the files and in the metadata” (DANS contract).
DANS may contact Mendeley Data to request action to be taken, if a file includes personally identifiable information, for instance exact names and exact dates of birth of survey respondents.
This is documented here.
In the case of any personal data being legitimately present within a dataset, these data may only be preserved, stored and used in line with the 8 data protection principles of the UK Data Protection Act (1998):
1 Personal data shall be processed fairly and lawfully.
2 Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.
3 Personal data shall be adequate, relevant and not excessive in relation to the purpose or purposes for which they are processed.
4 Personal data shall be accurate and, where necessary, kept up to date.
5 Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes.
6 Personal data shall be processed in accordance with the rights of data subjects under this Act.
7 Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data and against accidental loss or destruction of, or damage to, personal data.
8 Personal data shall not be transferred to a country or territory outside the European Economic Area unless that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data.
Rev1: depending on the author to approve access requests to confidential datasets is a risk in case the author is no longer able or willing to respond. A contingency plan for such cases would be strongly recommended, e.g. request the author to assign a person or organisation who will take over the responsibility if needed.
Rev2: what happens in case of “orphan datasets” for future access requests?
Research Data Management is a long-term strategic priority for Elsevier. Mendeley Data is part of the 5 year operation plan that has been accepted by the Elsevier Board May 2016 and therefore has funding for that period.
Funding includes provisioning sufficient staff, including a software development team (to continue to evolve the product to meet the designated community’s requirements, and for maintenance, bug fixes and operations), a repository manager, and a repository support officer.
[Call out our ability to engage with the community and add features to meet the community’s needs and requirements)
In more detail, members of staff assigned to Mendeley Data include:
Mendeley Data dataset reviewers receive internal training to carry out their duties. This includes training in reviewing datasets (as described in more detail in R4), and communicating appropriately with Data Producers.
Expertise and affiliations: Members of the Elsevier research data management team publish and speak widely on research data management (from policy, product and technology points of view), at industry forums such as Research Data Alliance and SciDataCon.
Department members currently co-chair the following Research Data Alliance groups:
Elsevier is affiliated to FAIR principles (signatory), FORCE11 Joint declaration of Data Citation Principles, ICSU World Data System, etc.
Elsevier has convened a Research Data Management Advisory Board, comprising experts with professional RDM roles in institutions, from industry bodies etc. The Board meets regularly to provide strategic input on RDM activities.
Mendeley Data engages with the community to receive expert advice and guidance in the following ways:
Here is a list of Elsevier Research Data Management Advisory Board members.
Mendeley Data ensures integrity and authenticity of data posted to the repository, comprising process and technical aspects.
Depositor identity is verified, in that the Data Producer must have registered a Mendeley account, providing their name and email address.
The dataset thenceforth is linked to the Data Producer’s account, and only the Data Producer may edit, manage and publish changes to the dataset.
Once a dataset is published, the dataset version is fixed and immutable. The Data Producer can edit their dataset, but these edits create a new version. Each dataset version is interlinked to the other versions, allowing a Data Consumer easily to see which is the latest version, and to navigate to access other versions. This allows for a dataset to evolve over time (e.g. for longitudinal studies), but for each version to be citable as a fixed record.
The visible version history, along with versioned (and easily visually comparable) contributor details, description and steps to reproduce, and related links, provide an indication of the provenance of the dataset, in terms of origin and subsequent changes.
Links to metadata and to other datasets are maintained through our Related Links, which allows Data Producers to add links to any associated datasets, software, articles, protocols or other entities. These links are version-specific.
To ensure completeness of data and metadata, as described in more detail in R8, required metadata fields must be completed in order to publish the dataset.
When the dataset is created, the data files are stored on Amazon Simple Storage Service (S3), which ensures integrity of files by calculating checksums and immediately repairing corrupted files using redundant data: “Amazon S3 also regularly verifies the integrity of data stored using checksums. If Amazon S3 detects data corruption, it is repaired using redundant data. In addition, Amazon S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data.” (Source)
Furthermore, when following review, a dataset is archived with long-term archiving partner DANS, “DANS-KNAW checks the checksum delivered with a dataset at ingest. DANS-KNAW runs regular checksums and will be able to detect and repair technical corruption within the archival environment.”
In addition, DANS may “in rare events [make] modifications to the content of a Dataset in the DANS-KNAW Archive after it has been delivered by ELSEVIER … Authenticity will be kept by recording modifications as provenance metadata. Versions will be kept where modifications to the content cannot be reversed.” [DANS contract]
In order to ensure that datasets are understandable to our Designated Community, certain requirements are placed on the data files and metadata which may be submitted.
We publish a list of preferred formats, to indicate to Data Producers which formats are suitable for long-term preservation, which is available here.
For these preferred formats, our long-term archive partner, DANS, “guarantees long-term usability”, while for all formats it “guarantees long-term bit-level preservation” (Contract between DANS-KNAWL and Elsevier - article 3.1 of the Archiving Agreement Elsevier – DANS).
Additional criteria for datasets are as follows:
Datasets must be:
Datasets must not:
The following mandatory metadata fields must be completed in order to publish a dataset via the web interface or API: (therefore completion of mandatory metadata does not require review)
The following optional metadata fields may be completed:
While the set of mandatory metadata fields may be small, this is necessarily so to be able to serve our broad Designated Community, posting datasets from a wide range of disciplines; a future enhancement is planned such that partner institutions and journals/publishers will be able to specify more detailed metadata templates for datasets submitted.
In order to ensure, as far as possible, compliance with the above criteria, every dataset submitted to the repository is reviewed by a trained Mendeley Data reviewer.
The reviewer audits the dataset to ensure it complies with the above requirements, so that it will be relevant and understandable to the Designated Community. In addition, reviewers may add curated links to associated peer-reviewed literature; Data Producers and Consumers cannot add these official links.
Datasets which are found to be already published papers, to be non-scientific, or to contain confidential information, will be removed from the public archive, and the Data Producer notified by email so they may post a new dataset or new version. The DOI which was minted for the dataset at publication will now resolve to a web page indicating the reason the dataset has been removed.
Datasets which pass review are subsequently forwarded to DANS to be long-term archived.
The DANS review process checks file formats as follows: “[data are] processed by a staff member at DANS in accordance with a standard data processing protocol. … On the basis of this protocol, the following types of verification have been performed since the introduction of DANS in 2005:
…Verification of the file format. In the future as well, it should still be possible to open and use the data files as well as the documentation files. The verification is performed on the basis of a list of preferred file formats.”
At ingest, the following information about data files is captured:
We do not currently carry out Collection Development for datasets in particular themes, but we plan a future enhancement to enable community members to do so.
The above information is documented here.
Data Producers may create (or in OAIS terms, “Ingest”) datasets either via the web interface, or the publicly-documented REST API (documentation available here).
Data Producers may edit and privately share a draft dataset version before publishing, at which point the version becomes fixed, and the dataset is accepted from the Ingest function by the repository, and is assigned to Storage.
After the dataset has been reviewed by the Mendeley Data reviewer (as described in more detail in R9), it is subsequently forwarded to DANS for long-term archiving and preservation.
The dataset Access function is fulfilled by Mendeley Data, which provides dataset record view and file downloads at the dataset URL (resolvable to via the dataset DOI), dataset listings, and search capability.
In addition, DANS provides Access to datasets’ metadata records only, via the EASY archive. Mendeley Data datasets in EASY archive are available here.
Our archival storage processes are documented here. These are reviewed and updated each time a change to our model occurs, for instance in response to a new requirement.
Data Producers must register a Mendeley account, and must authenticate via Elsevier's Access & Entitlements service, before being able to post data.
Data is stored with Amazon S3, part of Amazon Web Services (AWS); no data is stored on local Elsevier Servers. Details of AWS security policies can be found here.
Any corruption of datasets creates error logs; and backups are kept to restore data. Automated database backups happen every day whilst online, with a retention period currently set to 7 days. These backups are stored in the Amazon AWS S3 bucket. Amazon S3 synchronously stores “data across multiple facilities … the objects are stored, Amazon S3 maintains their durability by quickly detecting and repairing any lost redundancy.”
Point in time recovery is possible to any point within a second. Restores have been tested periodically to provision development and testing environments. S3 bucket is not backed up due to its high data durability policy. Amazon S3 storage is designed to provide 99.999999999% durability of objects over a given year. Please see page 5 here for further information.
In addition, datasets which pass review are archived with DANS ensuring long-term preservation, where they also benefit from redundancy and backups. “DANS-KNAW stores and archives three copies of any dataset received: two at the same location and one at a different location with a different disaster threat”.
Furthermore, “DANS-KNAW checks the checksum delivered with a dataset at ingest. DANS-KNAW runs regular checksums and will be able to detect and repair technical corruption within the archival environment.”
Mendeley Data uses checksums to verify the datasets and data files received are exactly those which were sent to DANS. Each file in the dataset is checksummed and so is each part of a transmitted deposit.
This preservation function encompasses: taking delivery of the dataset ingested, storing it, and ensuring it is archived, and accessible and usable to Data Consumers.
To ensure long-term availability of the data, DANS is contracted to “maintain the integrity of the Datasets content … and commits to the long-term and storage and maintenance of these materials.”
In terms of active preservation of file formats, “DANS-KNAW will employ appropriate technical solutions to adapt to changes in storage or access technology and to otherwise ensure the continued availability (in accordance with this Agreement) of the Datasets.”
In order to ensure that this archiving function is fulfilled:
For data files which are in the preferred formats published by DANS, and in turn by ourselves, DANS understakes to ensure they are usable over the long-term, while for all other formats, they will be accessible.
When the dataset is published, the Data Producer affirms via tickbox in the web form, that the dataset version will become fixed and immutable.
According to the Mendeley Data Terms (published here), Data Producers grant Mendeley Data the rights to copy, transform, and store the items, as well as provide access to them: “For Research Data that you make publicly available on the Site, you grant us a perpetual, irrevocable, worldwide, non-exclusive right and license to publish, extract, reformat, adapt, build upon, index, re-distribute, link to and otherwise use all or any part of the Research Data in all forms and media (whether now known or later developed), and to permit others to do so.”
The preservation process is publicly documented here.
Mendeley Data aims to facilitate Data Producers providing high quality datasets, and Data Consumers being able to evaluate datasets based on quality indicators, so they may find data useful for their research.
For every dataset submitted, required metadata must be supplied, and file format advice is provided, as indicated in R8. Metadata fields such as title, description, subject discipline category, steps to reproduce (methods), and related links and articles, aim to ensure that there is enough available information about the data such that the Designated Community can assess the quality of the data.
In addition, the Data Producer can provide additional links to related datasets, articles and software, which give extra context for the dataset, both in terms of content, and in terms of potential quality of research. Mendeley Data reviewers additionally add curated links to peer reviewed literature.
As indicated in R8, a future enhancement is planned such that partner institutions and journals/publishers will be able to specify more detailed metadata.
In order to help the Designated Community evaluate the quality of datasets, we provide, where relevant:
Additionally, in cases of authors posting non-data datasets (e.g. a research paper), we take down and contact to suggest submitting research data
Elsevier participates in the RDA Data Fitness For Use Working Group in order to contribute to community-approved standards for dataset quality.
Here are links to publicly available listings of Mendeley Data datasets:
Mendeley Data’s workflow for archiving datasets is archived here. Any changes to workflow will be recorded as new versions of this page, and links provided to previous versions.
A summary of the archiving workflow is as follows:
Every public dataset is available via our public dataset listing webpage, and via our public API.
Keyword searching of dataset metadata and file contents by Data Consumers will be supported, to enable effective data discovery. This keyword search capability will be powered by DataSearch, a data search engine developed by Elsevier. Data Consumers will be able to find relevant results from within Mendeley Data, and also other repositories indexed by DataSearch. This capability is expected in the first half of 2017.
Every public dataset is provided with a unique persistent identifier, in the form of a versioned DataCite DOI (draft dataset versions are provided with a reserved DOI, so that the eventual dataset citation can be known to the Data Producer in advance of publication, to enable it to be referenced in a manuscript for example).
Datasets may also be accessed via dataset links on article pages, where the article platform supports this.
In addition, dataset metadata is available in the DataCite registry, and the DANS EASY archive.
A planned future enhancement is to allow links between articles and datasets to be discovered via the Data Literature Interlinking platform, a Scholix-compliant platform.
Metadata can be harvested via standard markup: at the moment various metadata fields are marked up in Dublin Core and Google Science Datasets markup. A planned future enhancement is to add support for W3C DCAT markup. An additional planned future enhancement is to support harvesting via OAI-PMH protocol.
A dataset citation in standard format is provided for each dataset, to facilitate citing of datasets.
The repository aims to ensure that sufficient metadata in high enough quality are captured to support understanding and (re)use of data.
As described in R8, mandatory and optional metadata are provided by the Data Producer, when publishing a dataset.
To help ensure understandability of the data to consumers, Data Producers must provide a title and description; they are also encouraged to indicate the steps to reproduce the research which led to the data, for instance methods, workflow and/or software used; and to provide links to any software or other datasets used in generating the data, or associated articles.
In addition, Data Producers may provide a description for the individual files within the dataset, for instance describing the contents, related findings, the format, or the processes which led to the individual file.
As a repository, we are exploring ways to encourage Data Producers to provide intelligible and consistent file labeling and contents, possibly including a data dictionary, schema verification, data curation, etc. This is an area for ongoing improvement.
As described in R8, a future enhancement is planned to allow partner research institutions and journals/publishers to provide subject-specific / content-oriented metadata templates for associated datasets.
As also described in R8, preferred file formats advice is provided to Data Producers.
As described in R2, the Data Producer must select an appropriate licence from a pre-defined range of options, so that they can set their desired conditions for reuse of the data.
To address the issue of future evolution of formats, and any future migrations therefore needed, our long-term archive partner DANS undertakes to ensure all files uploaded in preferred formats are usable in perpetuity, while all formats will be preserved:
“DANS-KNAW accepts and understands that the best standards and procedures for the storage, manipulation and access of digital materials are likely to evolve over time and that it will avail itself of the latest technologies in cooperation with ELSEVIER.” (DANS contract)
The repository core infrastructural software is developed and maintained in-house, by a dedicated software development team, as described in R5.
The repository uses metadata standards including Dublin Core and Google Science Datasets, for dataset metadata markup. Reference standards are an ongoing area for review and improvement, and we aim to add support for experimental and subject-specific metadata standards in future.
The repository software is currently closed source and proprietary so documentation is not publicly available; however RESTful APIs are maintained which provide access to all public repository functions – these are publicly documented here.
Funding has been allocated to support future infrastructure development in line with our needs.
Mendeley Data is hosted, and research data stored, on Amazon S3, which provides a service across high-performance connections. Please refer to http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html for more information.
Amazon S3 (Simple Storage Service) servers ensure a reliable and stable service at all times:
Mendeley Data takes a proactive approach to security of research data and user data.
Regular penetration testing is carried out to ensure service is secure against attack. All previous penetration tests have failed to breach the service; recommendations issuing from tests have been implemented.
Amazons AWS security is detailed here. Data is stored in a MySQL database hosted in an Amazon Relational Database Service (Amazon RDS) which has the ability to encrypt data at rest and/or in transit using SSL. Please refer to information here for further information.
Our data is stored with Amazon S3 rather than local Elsevier Servers, therefore our data falls under Amazon business continuity and disaster recovery arrangements.
Amazon Cloud Services SLA availability is over 99.9% over a given year and comes with cross-region replication. The data centres are within the EU, specifically Ireland.
Please refer to https://aws.amazon.com/ec2/sla/, https://aws.amazon.com/rds/sla/ and https://aws.amazon.com/s3/sla/ for further information.
Our AWS usage has been configured to use multi availability zones. AWS business continuity program can be found in https://d0.awsstatic.com/whitepapers/aws-security-whitepaper.pdf.
Additionally, our long-term archive partner DANS provides multiple backups and redundancy, from which datasets could be restored in the very unlikely event they could not be restored from Amazon S3.
Access to the repository functions, for ingest or access, are not currently dependent on Mendeley Data employee availability. Elsevier plans and maintains comprehensive security, disaster recovery, business continuity plans (including employee availability), and arrangements for Mendeley Data fall under these.
Mendeley Data aims to interoperate with and contribute to the ecosystem of data repositories and other data services for example for discovery, persistence and access to data.
- Indexed by Data Search – a cross-repository search engine developed by Elsevier
- Metadata records available in DANS EASY archive
- Metadata records available in DataCite Explorer search index
Open APIs serving dataset metadata and file download URLs enable indexing by other services.
In addition, our structured metadata, marked up with Dublin Core, and Google Science Datasets markup enable indexing. We are working with Dutch Techcentre for Life sciences (DTL) to make our data more FAIR (Findable, Accessible, Interoperable, Reusable), and able to be discovered via FAIR Dataports.