Data Curation Primer

What is data curation?

Data curation aids in preserving the value of research data for the long-term by optimizing data for discoverability, interoperability, and reusability (Munoz & Renear 2011; Johnston et al., 2018; Thomer et al, 2022). For more specifics on data curation in practice see the definition below provided by Zorich 1995:

“Data sets need to be examined for consistency, long-term quality and relevance over time, and new sources of data must be identified and assessed. Changes or updates to data require authentication and verification. Tools which support object databases, such as authority lists, thesauri, data dictionaries and other documentation resources, need to be maintained, updated and distributed at regular intervals, while data security and access must be considered. All these concerns constitute the discipline of data curation.”

Why does data curation matter?

Data curation is important for making sure that datasets are FAIR:findable, accessible, interoperable, and reusable (Wilkinson et al., 2016). Previous work has demonstrated that data that have been curated are more likely to be trusted and reproducible as well as easily understood by other researchers and collaborators (Roche et al., 2015; McNutt 2016; Smith & Roberts 2016; Beagrie & Houghton 2014).

In particular, data curation of genetic and genomic sequence data is valuable because these data have immense reuse value for measuring genetic diversity. However, without information about the spatial and temporal context, the metadata, of the sample organism, the value of the genomic sequence is limited (Toczydlowski et al. 2021).

GEODE CURATE-A-THON

Getting Started

Data Curation Primer