Find a journal to publish your research today
In the evolving landscape of open science and scholarly communication, data repositories have become essential tools for research transparency, data preservation, accessibility, and reproducibility. This article explores what data repositories are, why they matter, how they are used, and provides an overview of notable global repositories researchers can use to store, share, and cite their data.
Data repositories are digital archives that store datasets and related metadata. They serve as platforms where researchers can deposit the raw and processed data underlying their research findings. These repositories ensure that data is preserved over time and accessible to others for verification, reuse, or further analysis.
Repositories often support various data formats and may specialize in certain disciplines or accept general datasets. They usually provide Digital Object Identifiers (DOIs) or persistent identifiers that enable data citation and tracking.
Transparency & Reproducibility: Allow others to validate findings by accessing raw data.
Open Science Compliance: Meet funding agency or journal requirements.
Increased Citations: Studies show papers linked to data get more citations.
Long-Term Preservation: Protect data from loss due to institutional or personal transitions.
Enhanced Collaboration: Facilitate reuse and interdisciplinary research.
Disciplinary Repositories – Specific to fields like genetics, social sciences, or economics.
Institutional Repositories – Managed by universities and research institutions.
Generalist Repositories – Accept all types of datasets regardless of discipline.
Government or Policy Repositories – Run by national agencies or NGOs.
Below is a list of widely used and trusted data repositories across disciplines. All links are clickable and lead to their official platforms:
Zenodo – Developed by CERN and OpenAIRE; accepts all disciplines.
Figshare – Offers storage for papers, datasets, and presentations.
Dryad – Data underlying scientific publications.
Harvard Dataverse – Platform for sharing, citing, and preserving data.
Open Science Framework (OSF) – Offers file storage and collaboration tools.
GenBank – Nucleotide sequences.
ICPSR – Inter-university Consortium for Political and Social Research.
PANGAEA – Earth and environmental science.
Protein Data Bank – Structural biology data.
NeuroMorpho.Org – Neural morphology data.
Columbia Academic Commons
Data.gov – U.S. government open data portal.
European Data Portal – EU-wide public data resources.
DataCite is a global nonprofit organization that provides DOIs for research data. Repositories that integrate with DataCite include:
TIB AV-Portal
DaRUS
Mendeley Data
When selecting a data repository, consider the following:
Disciplinary Relevance: Specialized repositories may offer better discoverability.
Licensing Options: Look for repositories supporting Creative Commons.
Metadata Standards: Ensure the repository adheres to proper citation practices.
Sustainability: Check if the repository is backed by reliable institutions.
Embargo and Access Control: Important for sensitive data.
Data repositories have become indispensable in ensuring the transparency, reproducibility, and longevity of academic research. With an increasing emphasis on open data by funders and journals, using trusted data repositories is not only best practice but often a requirement. Whether using a generalist platform like Zenodo or a discipline-specific repository like GenBank, researchers have a rich array of tools to responsibly manage and share their data.
For best practices on how to upload, license, and cite your data, visit the individual repository websites linked above.
Compiled by: SERN
Contact: editor@sern.online
Website: www.sern.online