The Data Seal of Approval board hereby confirms that the Trusted Digital repository TalkBank complies with the guidelines version 2014-2017 set by the Data Seal of Approval Board.
The afore-mentioned repository has therefore acquired the Data Seal of Approval of 2013 on April 8, 2014.
The Trusted Digital repository is allowed to place an image of the Data Seal of Approval logo corresponding to the guidelines version date on their website. This image must link to this file which is hosted on the Data Seal of Approval website.
The Data Seal of Approval Board
|Guidelines Version:||2014-2017 | July 19, 2013|
|Guidelines Information Booklet:||DSA-booklet_2014-2017.pdf|
|All Guidelines Documentation:||Documentation|
|Seal Acquiry Date:||Apr. 08, 2014|
|For the latest version of the awarded DSA |
for this repository please visit our website:
|Previously Acquired Seals:||None|
|This repository is owned by:||
A General Description of TalkBank
TalkBank is an archive of transcripts of spoken language interactions, many of which are linked to either audio or video. The major designated communities involved include child language researchers, aphasiologists, linguists, conversation analysts, and second language acquisition researchers. Long-term data preservation is provided by Carnegie Mellon University and CLARIN (www.clarin.eu). Several of the CLARIN centers have received the Data Seal of Approval and TalkBank data is currently mirrored by the CLARIN Center at the MPI in Nijmegen that has the Data Seal of Approval. The only outsourcing we do is for data mirroring to guarantee preservation. This project has been funded continuously by the National Institutes of Health since 1984 and has also received support from the National Science Foundation and the MacArthur Foundation. A search of scholar.google.com shows that there are now 4350 published articles based on use of the TalkBank databases. Current NIH support involves three major ongoing five-year grants for child language, aphasia, and phonology. The central website is http://talkbank.org. Within the overall TalkBank corpus, there are several subcorpora, the largest and oldest of which is CHILDES (Child Language Data Exchange System) located at http://childes.talkbank.org .
In the responses to the Guidelines, “we” refers to the programming and data analysis staff employed by the TalkBank Project at Carnegie Mellon. The term “producers” refers to the scholars who contribute data. The term “users” refers to the scholars who use the data. All URLs were visited on Tuesday January 7th, 2014.
Data producers are the scholars who have collected the spoken interactions and produced the transcripts and media that are then included in TalkBank. We support the data producers and guarantee data quality through these methods:
Corpora that meet all of these standards are judged to be valuable and are included in TalkBank.
1. Using the documentation provided by producers, we create metadata files for each resource. For the purposes of harvesting by OLAC (Online Language Archiving Community at http://www.language-archives.org, we produce a single metadata file for each corpus that is included in the relevant .zip file that can be downloaded. For harvesting in the IMDI/CMDI framework at http://www.clarin.eu/content/component-metadata, we use a program built into CLAN to automatically generate metadata records for each transcripts and media file. These can be seen at http://talkbank.org/data-imdi/talkbank/ and http://childes.talkbank.org/data-imdi/childes/
2. We do not require data producers to generate these OLAC and IMDI/CMDI metadata files. We do this using the data they provide.
3. We enforce a quality check as we create these files.
4. Our metadata formats are in compliance with the two major standards for linguistic metadata documentation, e.g. OLAC and CMDI. Both include Dublin Core as subsets.
5. The primary use of metadata is for resource discovery through OLAC and IMDI. Secondary analysis depends on use of the CLAN programs themselves.
6. It is possible that data producers will have failed to collect or transcribe some data that will turn out in the future to be important. However, because we have the raw media for most of our new corpora, transcriptions can be refined later on. None of these issues should lead to problems in terms of long-term preservation.
Issues relating to disclosure risk are discussed in detail between the Director and the Data Producer
We would recommend the preparation of procedural documentation (ideally shared online) that reflects these procedures
We recommend the provision of supporting documentation for this item before the next DSA submission
We rely on Creative Commons license CC BY-NC-SA 3.0 as noted on our homepages. To control password access to data in the AphasiaBank segment of TalkBank with a possible disclosure risk, we rely on three security measures:
1. Password access is only given to fulltime faculty or clinicians with SLP (Speech and Language Pathology) certification from ASHA (the American Speech and Hearing Association). Students can only access data under faculty supervision.
2. Faculty must apply for membership in AphasiaBank and state their intended use of the data.
3. Members agree to the Ground Rules given at http://talkbank.org/share/