The Data Seal of Approval board hereby confirms that the Trusted Digital repository Strasbourg Astronomical Data Center (CDS) complies with the guidelines version 2014-2017 set by the Data Seal of Approval Board.
The afore-mentioned repository has therefore acquired the Data Seal of Approval of 2013 on August 7, 2014.
The Trusted Digital repository is allowed to place an image of the Data Seal of Approval logo corresponding to the guidelines version date on their website. This image must link to this file which is hosted on the Data Seal of Approval website.
The Data Seal of Approval Board
|Guidelines Version:||2014-2017 | July 19, 2013|
|Guidelines Information Booklet:||DSA-booklet_2014-2017.pdf|
|All Guidelines Documentation:||Documentation|
|Repository:||Strasbourg Astronomical Data Center (CDS)|
|Seal Acquiry Date:||Aug. 07, 2014|
|For the latest version of the awarded DSA |
for this repository please visit our website:
|Previously Acquired Seals:||None|
|This repository is owned by:||
Strasbourg astronomical Data Center (CDS) is dedicated to the collection and worldwide distribution of astronomical data and related information.
The CDS hosts the SIMBAD astronomical database, the world reference database for the indentification of astronomical objects; VizieR, the catalogue service for the CDS reference collection of astronomical catalogues and tables published in academic journals; and the Aladin interactive software sky atlas for access, visualization and analysis of astronomical images, surveys, catalogues, databases and related data.
The CDS mission is to:
The CDS cooperates with the French Space Agency CNES, the European Space Agency ESA, the European Southern Observatory ESO, the US National Aeronautics and Space Administration NASA (with a long term collaboration with the Astrophysics Data System ADS and the NASA Extragalactic Database NED), astronomical academic journals, and with other data and service providers around the world such as the National Observatory of China (NAOC), the Inter- University Centre of Astronomy and Astrophysics (IUCAA Pune, India), the National Observatory of Japan (NAOJ), the Institute of Astronomy of the Russian Academy of Sciences (INASAN) and the South African Astronomical Observatory (SAAO). CDS hosts mirrors of NASA ADS and of the Astronomy and Astrophysics international journal.
CDS is a member of the World Data System of the International Council for Science ICSU and has thus been certified following WDS criteria.
DSA label is sought only for VizieR and Aladin. The Simbad database is continuously updated with new information extracted from academic publications. It seems to us that it is not suitable for such a label.
Inputs for Aladin and Vizier services are tables issued from astronomical academic journals such as Astronomy&Astrophysics (A&A), Astronomical Journal (AJ), Astrophysical Journal (ApJ), Monthly Notices of the Royal Academy Society (MNRAS), catalogues and image surveys supplied by international agencies and centers such as NASA, ESA, ESO and CADC (Canadian Astronomy Data Centre), researchs teams and individual researchers.
The data arriving at the CDS are obtained from reliable sources, agencies and large projects and/or attached to a refereed publication wich reference is given. They are kept and redistributed in format and with attached metadata allowing them to be examined and scientifically reused. As explained later, they are compliant with disciplinary standards.
A standardized README file is attached to each catalogue. It contains the description of the catalogue content and information about its origin. For data linked to a publication, the article reference, abstract and date are given plus a link to the publication.
Example of data linked to a publication:
Example of data linked to a telescope:
Formats supported by the CDS are mainly:
There is an interface allowing astronomers to submit their catalogue and its description (see item 3) for ingestion and checks.
Submission forms allow the data producers to add metadata which will then be validated by CDS staff (researchers, specialized librarians). The data and metadata follow standards adopted by CDS.
The data producer can directly deposit the data by the use of submission forms:
The CDS also extracts tables from the journals and builds the required metadata.
For catalogues supplied by agencies or teams which generate big data, metadata are built according to the documentation supplied by the data producer. They are generally the object of discussions between data producer and CDS managers.
Metadata and standards for tables/catalogues (CDS), in agreement with the journal Astronomy&Astrophysics and other journals: http://cds.u-strasbg.fr/doc/catstd.htx
Bibliographical reference standard: http://cdsweb.u-strasbg.fr/simbad/refcode/refcode-paper.html
Recommendations to the authors on the journal sites:
Process diagram for data quality checks:
http://cds.u-strasbg.fr//vizier-org/OAISTranslation.html (Description of ViZieR pipeline)
Tools developped at CDS transform received data in a dedicated format. These tools verify the coherence of the data: number of lines, number of columns, column type (integer, float, short, char). These informations are avalaible in the REAMDE file.
The persons in charge of data validation are specialized librarians. In case of problems, they discuss the issue with the data provider and/or CDS astronomers.
The missions of the CDS are explained in the following document:
Strasbourg astronomical Data Center (CDS) is dedicated to the collection and worldwide distribution of astronomical data and related information. One of its mission is to:
"collect useful information concerning astronomical objects that is available in computerized form"
All the distributed data are archived.
Data distributed by the CDS are the object of agreements with data producers (disciplinary journals, data producers of the discipline such as NASA, ESA, etc.)
Observational data are public according to a timetable established by the data producer. In very rare cases there can be a "proprietary period" during which its usage is reserved to the team which is producing the data. Data in VizieR linked to a publication are public when the paper is published even if the article itself is not yet in open access.
For exemple the data policy of the international Journal "Astronomy & Astrophysics":
It is mandatory for A&A authors to publish the data that are presented and discussed in articles and needed to reproduce the results. Archiving the data also increases the value of the article, and thus its impact in the community. Publication of the data, usually at the CDS (see below), should occur immediately upon acceptance of the article referencing them. Some common examples of data that must be archived are the measurements of radial velocities leading to the detection of planetary or stellar companions to stars, the photometric data used in asteroseismologic studies, etc. By data, we mean here not only primary observational material, but also tools of general interest such as catalogs, theoretical tables of lasting values, etc.
Whenever the primary observational data (e.g., the spectrograms that were used for determining radial velocities or redshifts) are archived at a facility such as ESO or HST and therefore publicly available, there is no need for authors to provide them to A&A; in this case, we'll archive only the reduced data (i.e., the radial velocities and the reduced photometric data in the examples given above). When primary data presented in articles are not publicly available through an institutional archive (e.g., the IRAM spectroscopic data), the calibrated data will be archived at the CDS.
By contract with A&A, the CDS stores the data that are published in A&A articles and graciously puts them at the disposal of the global community. The data are also linked to the general purpose data mining tools developed at the CDS and to the published articles through the ADS. The CDS requires the data tables to be in ascii format and each table is accompanied by a readme.txt file that describes the table’s content. The readme file format defines a standard that is used by all major astronomy journals. Primary data can also be archived at the CDS as graphics files in FITS format. This is of particular interest for spectrograms. At this point, no other formats than ascii and FITS are supported by the CDS for A&A data. Also by contract with the Journal, CDS provides help to A&A authors in order to prepare the archival files.
The data are stored on RAID level 5 or 6 disks and backup of these data are made at regular intervals. These backups are made in a building distant from the dataserver in a daily way. A low level supervision of the services (state of controllers, supplies, logical, physical and virtual disks, fans, temperature, UPS, etc.) as well as a supervision of the high level services are made by Nagios probes and warn in real time the engineers in charge in case of critical alert due to a system failure.
Electrical installations, UPS (Uninterruptible Power Supply), cooling systems, firewalls, computers, networks, etc. are redundant to insure a high level availability of the data repository.
The VizieR service has 9 mirror sites to mitigate any technical failure , and insure the best possible availability of service: ADAC (Astronomical Data Archives Center, Japan), CADC (Canadian Astronomy Data Centre), University of Cambridge Institute of Astronomy (UK), IUCAA (Inter-University Centre for Astronomy and Astrophysics, India), INASAN (Institute of Astronomy of the Russian Academy of Science), NAOC (National Astronomical Observatories, Chinese Academy of Science), JAC (Joint Astronomy Centre, Hawaii), CfA (Center for Astrophysics Harvard University, USA), SAAO (South African Astronomical Observatory, South Africa).
The ALADIN service has a mirror site at IAS (Institut d'Astrophysique Spatiale, Paris, France) for some data.
References for documented process:
http://cds.u-strasbg.fr//vizier-org/OAISTranslation.html (Description of ViZieR pipeline, Procedures in use)
The data storage format are long-lasting formats: FITS metadata for images, and other disciplinary standards (ASCII, FITS, standardized metadata) for tabular data. The use of these formats guarantees the reconstruction of information systems over time independently of the used technologies, ie their conservation on the long term. ASCII files are independent of the used SGBD technology.
FITS (Flexible Image Transport System) is the standard data format used in astronomy to store, transport and archive data files. Its flexibility allows it to be used for a large variety of data types: tables, images, spectra, time series.
- The first version of FITS was released in 1981. Its evolution follows the "once FITS, always FITS" rule, meaning that developments of the format must not invalidate former existing FITS files.
- A FITS file is made of one or more Header + Data Units. Thus, metadata and data are kept together, the metadata being stored in ASCII as a set of keyword/value cards.
These two key aspects make FITS a very-well suited format for archiving and long-term preservation purposes.
More information about FITS can be found at http://fits.gsfc.nasa.gov/fits_overview.html
The data redundancy on external sites guarantees access toward all internal risks.
We use as far as possible recognized sustainable open source software and systems (PostGreSQL, Linux OS, etc.) which are a guarantee of sustainablity.
We also insure a regular migration of the used technologies as proven by the fact that CDS started in 1972 and has maintained its data holding and databases since then, including of course several major migrations.
Migration plan since 1972:
1972 - 1979
Server : IBM 360/65 of Meudon Observatory, unique computer in French astronomy
Storage : removable IBM 2314 diskpacks, 29 Mb
2 disks at the beginning, 5 disks at the end
Backups : half inch magnetic tapes, 1600bpi and 6250bpi
1979 - 1981
Server : IBM Computer of the CNRS in Orsay
Storage : IBM disks 3330 or 3340 (?)
Backups : half inch magnetic tapes 6250bpi
1981 - 1984
Server : Univac 1108/1110 of the CNRS computer in Strasbourg
Storage : Univac disks 2x80 mega words of 36 bits
Backups : half inch magnetic tapes 6250bpi
1985 - 1990
Server : Univac 1110 of the Paris-Sud University (Orsay)
Storage : Univac disks.
Backups : half inch magnetic tapes 6250bpi
1990 - 1995
Server : DEC 5400 station at the Strasbourg Observatory
Storage : SCSI disks
Backups : exabyte cartridges (2.5 Gbytes at the beginning)
1995 - 2006
Servers : Several SUN stations (SPARC technology) at the Strasbourg Observatory
Storage : SCSI disks
Backups : DAT cartridges. Daily incremental backups, Weekly full backups
2007 - today
Servers : Intel and AMD CPU servers running Linux (Debian, Ubuntu, CentOs, Scientific Linux OS)
Storage : SCSI, SAS and FiberChannel disks in RAID 1, 5 and 6
Backups : Managed at the observatory level on a server in another building
The "workflow" manages the data life cycle with retention of data said "obsolete" (mainly tabular data). A mechanism was set up which allows one to keep track of the history on the distributed data. The main ingestion and modification stages on the catalogues metadata are logged, signed and dated.
Services in CDS are living information systems, and metadata can evolve in VizieR.
Catalogues become obsolete when the data producer provides a new version. "Obsolete" data keep its ID and remains accessible, with a link to the current version.
Exemple of obsolete catalogue:
The responsibility of the access and the availability of the data are managed through agreements with the data producers.
Mirror sites map:
http://cds.u-strasbg.fr//vizier-org/OAISTranslation.html (VizieR responsability in the archival)
The published catalogues are indexed by a name that is unique, standardized and reserved. This name is in agreement with the reference used in journals or for the other catalogues with a name as NN/DDDD (NN=roman numeral from I..X according to the subject of the catalogue, DDDD=sequential number).
The catalogue nomenclature is persistent. Articles have their own DOIs.
Link explaining the nomenclature:
Example of link for a CDS table: http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/558/A18
Data are distributed in compliance with the standards of the discipline and are made available through the Astronomical Virtual Observatory. OAI harvesting is effective: the IVOA Registry of Resources is OAI-PMH compliant.
IVOA standards: http://www.ivoa.net
IVOA Registry: wiki.ivoa.net/twiki/bin/view/IVOA/IvoaResReg
User friendly search tool for catalog selection.
Great description (byte-by-byte) of records.
A control of integrity is done on the tabular data. This control consists in an audit on the Postgres database of the CDS based on triggers. It updates a logs table containing the transactions Delete/Insert/Update (date, user, table,IP address, software, data before the update and possibly the request).
The procedure follows the diagram:
The on-line publishing of tabular data and images is realized by the qualified librarians, approved and validated by astronomers who make sure of the data quality.
The procedures are described in the document:
http://cds.u-strasbg.fr//vizier-org/OAISTranslation.html (Astronomers part in VizieR archival and Description of VizieRpipeline)
The data input is realized through a secure ftp service (vsftpd: very secure ftp daemon). The deposit of data is done by creation of a directory in which we put files. This directory is invisible and is known only by his(her) creator.
Finally, a program watches the upload by sending an e-mail when data are added and all transactions are logged (vsftpd.log)
The technical infrastructure of CDS explicitly supports the task and function described in a standard like OAIS.
The technical infrastructure of VizieR is in compliance with archival standard.
The following document describes procedures "à la OAIS":
The data licenses of producers are preserved.
Here a link to the Vizier licence available for the consumers:
The data provider name and reference of publication (when applicable) are attached to the data to allow users to reference the origin of data, in agreement with the accepted code of conduct of scientific research.
A good description of the code of conduct of scientific research can be found e.g. in the ethics statement of the American Astronomical Society:
The following paragraphes as particularly relevant to CDS activities:
"Proper acknowledgement of the work of others should always be given, and complete referencing is an essential part of any astronomical research publication. Authors have an obligation to their colleagues and the scientific community to include a set of references that communicates the precedents, sources, and context of the reported work. Deliberate omission of a pertinent author or reference is unacceptable. Data provided by others must be cited appropriately, even if obtained from a public database.
All authors are responsible for providing prompt corrections or retractions if errors are found in published works with the first author bearing primary responsibility.
Plagiarism is the presentation of others’ words, ideas or scientific results as if they were one’s own. Citations to others’ work must be clear, complete, and correct. Plagiarism is unethical behavior and is never acceptable.
These statements apply not only to scholarly journals but to all forms of scientific communication including but not limited to press releases, proposals, websites, popular books, and podcasts."
The data consumers are asked to quote the origin of data. The origin of data is available in the Readme file.
The CDS is not proprietary of the data and is not required to check eventual wrong usage, so no legal action will be introduced in case of misbehaviour/misuse of data.
Data is openly available and measures such as termination of access are not feasible.
Should rather be "not applicable"