Skip to main content

Find Data

Discover thousands of research datasets from across the disciplines in data repositories.

There are thousands of data repositories that house datasets from a wide range of research areas, which greatly supports open research. Yet this means dataset publication is fragmented across thousands of repositories and it will take several searching approaches to find relevant datasets. Moreover, each repository works slightly differently so will require different searching strategies. Once you have found a dataset you should assess the quality and consider how you can reuse the data. 

Repository and data search engines

  • Re3data is a registry of over 2500 research data repositories from every domain. You can search by over 40 attributes including subject, content type and licence to find a repository in your discipline.
  • Datacite provides a search function that can help you find datasets in your area. This search will find datasets related to your subject across multiple repositories.
  • Google Dataset Search is similar to Datacite as it searches repositories to find datasets of interest according to your criteria. 

Multidisciplinary repositories

Searching multi-disciplinary repositories can help to find more datasets in your field due to their extensive content and coverage.

Code Ocean

Code Ocean is an open access platform for code and data. They allow users develop, share, publish and download code through a web browser, eliminating the need to install software on personal computers.

data.ncl

Newcastle’s research data repository, data.ncl, is powered by Figshare. It contains a growing body of datasets developed by researchers at Newcastle and links to data held in other repositories. It is also used for software and code via an integration with GitHub.

Dryad

Dryad is a repository governed by a nonprofit membership organization. Several publishers partner with Dryad to coordinate the submission of manuscripts with the underlying data. 

Figshare

Figshare is a third-party repository owned by Digital Science. It is free to use for researchers and an institutional version is used by a growing number of universities, inculding Newcastle Univeristy. It is also used for software and code via an integration with GitHub

Mendeley

Mendeley is a third-party repository owned by Elsevier. Researchers can freely archive and share data.

Open Science Framework

Open Science Framework is an open source web application created by the non-profit Centre for Open Science. It enables researcher to collaborate, document, archive and share research outputs.

UK Data Archive

UK Data Archive is the UK's largest collection of digital research data in the social sciences and humanities. It is funded by the Economic and Social Research Council (ESRC), Jisc and the European Union (EU).

UK Data Service

UK Data Service is part of the UK Data Archive. It is the UK’s largest collection of social, economic and population data resources.

Zenodo

Zenodo is a data repository for EU funded research, developed in partnership with CERN. It is available to anyone to use free of charge irrespective of funder. It is also used for software and code via an integration with GitHub.

Data access statements

There is a developing drive for publications to include a data access statement to outline where and how the data can be accessed. You may find related datasets in your area in the literature and then use of the same repository in your research area.

Social media as data

Social media content is a rapidly growing and potentially rich source of research data. However, it presents legal, ethical and technical challenges. Even though the information is on the internet it does not mean it can be automatically accessed and reused. As the data involves human participants, you will need to address these methodological and ethical considerations in your data management plan and ethics application.

When considering and obtaining social media data you should:

  • check the Terms of Service of the host platform
  • request data using an API (application programming interface) if available
  • undertake web scraping responsibly
  • ensure you have sufficient and appropriate storage for the volume of the data you will obtain
  • plan for the removal of direct identifiers

Third party data

To acquire data from a third party, you may be required to complete a questionnaire or document. This normally aims to confirm to the data provider that the data will be securely stored and processed. If you are asked to complete one and need guidance on what to say please contact us