What are research data?
The UKRI Concordat on Open Research Data defines research data as evidence that underpins the answer to the research question.
These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods such as data extraction from existing evidence. Data may be raw or primary (for example directly from measurement or collection), derived from primary data (for example cleaned up or as an extract from a larger dataset), or derived from existing sources where the rights may be held by others.
There can be different implications for working with and preserving different types of research data:
- Observational: data captured in real-time, for example, neuro-images, sample data, sensor data, survey or interview data. It is usually irreplaceable and hard or impossible to re-create.
- Experimental: data captured from laboratory equipment by the researcher or a service used by them, for example, gene sequences, chromatograms, chemical toroid magnetic field data. The data is often reproducible but reproduction could be costly.
- Simulation: data generated from test models. For example climate, mathematical or economic models. Datasets used here are usually very large but model code in itself might be sufficient to recapitulate results.
- Derived or compiled: for example, text and data mining, 3D models, compiled databases. Data is reproducible but reproduction could be costly.
- Reference or secondary: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated elsewhere for re-use. For example, gene sequence databases, chemical structures, or spatial data portals.