Data citation
Data citation is important because it:
- acknowledges the author's sources
- makes identifying data easier
- promotes the reproduction of research results
- makes it easier to find data
- allows the impact of data to be tracked
- provides a structure that recognises and rewards data creators.
It is good practice to cite any existing datasets you use. If you use data from a repository that has been released under an open license, you are obliged to cite it (even under a CC0 license) and should do so with a full citation. It is also good practice to cite datasets published in a data journal with a full citation.
By citing the data paper, you also reward the author for sharing their data, as these citations can be tracked in the same way as for any scholarly paper. You should therefore include a reference to the data paper describing the data, followed by a reference to the data in the repository itself. In order for this to work, it is essential that the citations are in the references section of the article and include the DOI (or any other identifier the repository might use).
Our guidance on data access statements moved to a new section, Data access statements, and was updated to include more examples.