Documentation about data should be provided at both, the study and the data level.
Supporting information can often be found in laboratory notebooks, questionnaires and interview guides, final reports or catalogue metadata. It typically provides context to the data and instructions for its use and re-use and any information about potential restrictions. Supporting information is often used to provide study-level documentation. See for example documentation provided for the "Health Survey for England, 2010" at the UK Data Service.
Supporting information files are also especially suited for qualitative data, where they can assist with the anonymisation process. A particular example is a data list, which summarises items within a data collection, assigns a unique identifier and provides biographical characteristics or main features of the item. For interviews, this could be:
- Interview ID
- age
- gender
- occupation
- location
- interview place
- interview date
- transcript file name
- recording file names.
Embedded documentation is most suitable for data-level documentation, which can often be found for quantitative (tabular) data, but it can also be represented by separate files (for example README text file) inside a file archive containing a dataset.
Examples of embedded metadata include
- headers and variable names
- units
- field labels
- value codes
- reference to external classification schemes,
- instructions on how derived variables have been created.
Embedded metadata for qualitative data may be headers in interview transcripts. Some metadata can also be embedded as document properties of a file (in Windows) and structured more extensive metadata may be created using formats such as XML (see the UK Data Archive for an example extract from a UK Data Service DDI catalogue record in XML format).