Secondary data
Using secondary data can be a good alternative to collecting data directly from participants (primary data), removing the need for face-to-face contact.
Secondary data relating to living human subjects often requires ethical approval depending on the source and nature of the data. The extent to which the ethical review application form must be completed also depends on the source and nature of the data.
This guidance covers some of the ethical issues relating to use of secondary data and how this impacts the ethical application process.
-
Ethical approval is required for projects where secondary data includes personal data - data that relates to identifiable living persons.
Data relating to the deceased
When data relates to deceased human subjects, ethical approval is required if the data includes either:
- sensitive personal data about living human subjects, or
- data relating to health or census information from the last 100 years.
And where this data identifies, or could identify, either the deceased individual or others.
Among the reasons ethical review is required is because:
- sensitive personal data can have implications for living relatives
- some data may be covered by Data Protection legislation.
Anonymised data
Data which are completely and robustly anonymised do not contain personal data and so ethical review and approval is usually not required.
For the avoidance of doubt, this means data that are already anonymised rather than data received in identifiable or pseudonymised form and then anonymised by the researcher.
However, there are scenarios involving anonymised data where ethical approval may be required (discuss with your School ethics committee if you are unsure):
Data from a source which requires assurances or additional approvals
If the data source requires assurances that the project has undergone ethical review or evidence that use of the data is legitimate, an ethical review application can be submitted.
If the data source requires a:
- Data Management Plan - contact Research Data Management.
- Data Protection Impact Assessment (DPIA) - contact Data Protection (dataprot@st-andrews.ac.uk).
If the data involves or originates from the NHS or health and social care, see the Research involving the NHS page.
Data which risk re-identification of individuals
If the data could be used to re-identify individuals, then an ethical review application may be needed - consider the items of data you will be working with and whether this is a risk.
For example:
Combined data - combining data can lead to re-identification of individuals, particularly if data is linked at an individual level by matching unique reference numbers or data points.
Rare, unusual, or low number data – rare or unique data, such as that relating to unusual characteristics or rare health conditions, are difficult to truly anonymise as there often few individuals with those characteristics or conditions.
Reasonable means – GDPR suggests that the risk of identification, researchers should consider ‘means reasonably likely to be used’, accounting for factors such as costs and time involved and available technology.
Data with additional ethical considerations
If there are additional ethical considerations, an ethical review application can be submitted. For example, if data raises concerns around:
- the original participants’ consent for future use of the data
- the provenance of the data
- access to sensitive data not already in the public domain
- social profiling
- the research, data, or outcomes adversely impacting a particular group or community
See the section below on ethical considerations.
-
Secondary data – internal datasets
Secondary datasets may sometimes be sourced from the within the University i.e. data collected as part of previous projects within a School. It is important to consider whether re-use of this data is in line with the original ethical approval and the consent given by participants. An ethical amendment may be required for both the original ethical approval to allow the data to be shared AND a new ethical review application for the new research project (if sufficiently different).
Internally sourced data should still be acknowledged and appropriately referenced, and the same considerations given as to other secondary data sources such as around access and permissions, data management and confidentiality. Researchers should also consider whether using this type of secondary data is appropriate for their needs (i.e. whether it meets the requirements for an academic research project).
Secondary data - large quantitative datasets
A commonly used source of secondary data are large quantitative data sets such as census data, health data, household surveys and market research.
There are several sources that can give access to these types of data and what is required to access them varies by source and by the nature of the data, for example:
- ‘open’ datasets where the data is freely available to download
- ‘closed’ datasets where users must register with the data source but that require minimal additional work
- datasets that contain more sensitive information and where users may have to complete paperwork such as a data management plan.
Sometimes more sensitive datasets can only be accessed via a secure web portal and no local copies retained.
Secondary data - qualitative and mixed-methods data
Secondary qualitative data is less common, largely due to the difficulty in anonymising qualitative data. However, there are sources of secondary qualitative data including the UK Data Service and library data such as oral histories, diaries and biographies.
Secondary data - biological data
There are several resources for access to biological data including human-related data. Use of biological data and bioinformatics is a wide are with several ethical concerns around confidentiality, implications of research into DNA and genomics, bias and profiling, the sensitivity of identifying risk levels related to disease. Researchers planning research involving biological data or bioinformatics should consult with disciplinary guidelines and organisations and colleagues with specific expertise. If using secondary data of this type, researchers must ensure they do so in accordance with the requirements of the data sources. Researchers should also ensure that they check if any NHS ethical approval, governance or R&D approvals are required.
-
Access, permissions and consent
Access to secondary data must always be used in accordance with the requirements of the data source, GDPR and the common law duty of confidentiality. Secondary data must always be appropriately referenced and acknowledged. Researchers should always act in accordance with the Principles of Good Research Conduct, even when working with secondary data.
Researchers should check whether their use is in line with the consent originally obtained from participants and seek assurances on this from the data source.
Where data is obtained in anonymous form, researchers should be conscious of the risk of de-anonymising data through triangulation of several data points or sets.
While there are open access datasets that are freely available, it is common that there are conditions and requirements put in place by the data source or controller around who can access the data and how it is used. For example, this might include:
- that researchers sign terms of use
- that researchers have a comprehensive data management plan
- that researchers can provide assurances around the security of the data once in their possession
- verification that the person accessing the data has a legitimate reason i.e. evidence that you are a researcher at a recognised institution
- that the data be accessed via a secure portal
- that no local copies are retained
- that any copies of the data be destroyed within a certain timescale (may require a destruction certificate)
- that the raw data be processed by the data source into an anonymised form before it is released
In the latter examples, where there is more complex requirements and the data source is providing a service such as preparing and moderating access, this may incur costs that would need to be factored into researchers plans and budgets.
-
Ethical issues to consider
The ethical application form includes an early filter question on use of secondary sources. This means that if researchers are using secondary data with no additional ethical issues they can skip to the end of the form – the declarations section. If, however, there are ethical issues, researchers should describe these and how they will be mitigated in the ‘Ethical Considerations’ free text field later in the form.
If data are particularly sensitive, or it is required by the data source, researchers may wish to complete the Data Management section of the ethical review application form (Word) or a separate data management plan.
When making an application for ethical approval of research using secondary data, researchers should consider:
- Is the proposed research in line with the participants original consent? Can the data source provide assurances on participants original consent?
- How will the data be managed? If there is identifiable, personal or sensitive data how will confidentiality be maintained and data kept secure?
- Will the proposed research and use, management and storage of the data meet with the data sources requirements? Have all the appropriate documents been completed and permissions granted?
- Will the data source be acknowledged and referenced?
- Are there any copyright issues around the data?
- By pulling together several data sources is there any risk of de-anonymising participants?
- Will using this data or combining it with other data risk bias or ‘profiling’ of a particular group?
- How will you present the data or analysis? Will this ensure the confidentiality and anonymity of participants?
- Will the data identify individuals as being at risk of a condition or disease where they may have otherwise been unaware?
You may find parts of the UK Government's Data Ethics Framework useful for exploring some of the potential issues.
-
Data sources
The UK Data Service – this is one of the core UK sources of secondary data, including government data such as the Household Survey, plus an increasing amount of qualitative data and data collected as part of research funded by UK research councils https://www.ukdataservice.ac.uk/
The Office of National Statistics – this is the UK’s recognised national statistics institute and conducts the census in England and Wales amongst other large national and regional surveys https://www.ons.gov.uk/
The Scottish Governments statistics publications – this includes often aggregated statistics reporting regional level (rather than individual level) data, though some more detailed datasets are available for older data https://www.gov.scot/publications/?publicationTypes=statistics&page=1
NHS Digital data and statistics publications – this includes details about clinical indicators, health and social data, though again this is often aggregated and at a regional level rather than individual level data https://digital.nhs.uk/data-and-information/data-collections-and-data-sets/data-sets
Information Services Division (ISD) Scotland – this includes Scottish health and social dare data, often aggregated and at a regional level https://www.isdscotland.org/
Data.gov.uk – a new resource for ‘open’ UK government data https://data.gov.uk/
British Library – the British Library hold a number of collections including oral histories, biographies and newspaper articles. https://www.bl.uk/collection-guides/oral-history#
Qualitative Data Repository – a qualitative data repository hosted by Syracuse University https://qdr.syr.edu/
European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI) https://www.ebi.ac.uk/Health Informatics Centre (HIC) – local health informatics service linking health data https://www.dundee.ac.uk/hic/
Open access data directories
OpenAire.eu – A searchable directory of open access datasets such as those accompanying publications https://explore.openaire.eu/
JISC Directory of Open Access Repositories (OpenDOAR) – a searchable directory of open access repositories http://v2.sherpa.ac.uk/opendoar/
-
- Association of internet researchers – ethics guidance
- The European Commission (2018) – Use of previously collected data (‘secondary use’). Ethics and Data Protection, VII, 12-14
- Irwin, S. (2013). Qualitative secondary data analysis: Ethics, epistemology and context. Progress in development studies, 13(4), 295-306.
- Morrow, Virginia and Boddy, Janet and Lamb, Rowena (2014) The ethics of secondary data analysis. NCRM Working Paper. NOVELLA.
- Rodriquez, L. (2018) Secondary data analysis with young people. Some ethical and methodological considerations from practice. Children’s Research Digest Volume 4, Issue 3. The Childrens Research Network.
- Salerno, J., Knoppers, B. M., Lee, L. M., Hlaing, W. M., & Goodman, K. W. (2017). Ethics, big data and computing in epidemiology and public health. Annals of epidemiology, 27(5), 297-301.
- UK Data Service guidance on secondary analysis