It is important that researchers are aware of the responsibilities that using a data collection entails. This is especially the case where data has been obtained from restricted or subscription sources. It will also apply to data that includes confidential content.
Researchers need to think about:
- how they will use data in practice
- how data is being combined (where multiple sources are being used)
- how it will be presented during and after analysis
- whether data is being appropriately stored and backed up.
An obligation in working with data is to be open and transparent about what was used and in what way. One way of doing this is to keep its usage well documented. This is an important stage in developing habits that demonstrate research integrity and where appropriate support reproducibility and verification. Such documentation may then be expanded to describe the whole of the research process.
A well documented research project will be:
- a useful resource for the data creator during and after the research;
- evidence of a commitment to research transparency;
- a foundation of good data management.
The UK Data Service has created a resource pack, 'Dissertations and their data: promoting research integrity' (PDF), which aims to introduce the idea of transparency in research into undergraduate teaching.
Terms and conditions
Some data collections attract particular restrictions, but there are also general terms and conditions that should be observed as part of good research practice.
Databases or data collections should be:
- used for academic research only
- not used for commercial purposes
- not shared with anyone else
- destroyed once used and not re-used for other projects,
- properly attributed so original sources may be consulted if necessary.
In most cases, this should be straightforward. However, if you have any concerns you should raise these with your own Subject Librarian.
Situations where more clarification may be needed include:
- developing a research project with commercial potential or collaborators,
- fulfilling requirements from funding bodies that research data be preserved,
- creating a working environment that satisfies restricted access data suppliers requirements for secure access.
Data citation is rapidly emerging as a key practice supporting data preservation, access and reuse, as well as sound scholarship. The motivation to cite datasets arises from a recognition that data generated and archived in the course of research are just as valuable to the ongoing academic discourse as papers and monographs. This view is shared by research institutions, funding councils and a growing number of publishers.
The Data Citation Synthesis Group (FORCE11) has published a set of data citation principles, The Joint Declaration of Data Citation. This represents a formal statement pulling together practices used in the research and publishing arenas and in common use. The declaration comprises eight principles that stress the importance and legitimacy of data, the need to give scholarly credit to contributors and the importance of data as evidence.
Cited data should have unique and persistent identifiers, ie a Digital Object Identifier (DOI) which is the equivalent of an ISBN for data. These are issued by data repositories such as ORA-Data. Visit Research Data Oxford for more details.
Here is an example of citation for an existing ORA-Data deposit:
Tomkins, D. & Jackson, A. (2015) “Ephemera and the British Empire - colour illustrations”. Oxford University Research Archive. doi:10.5287/bodleian:xp68kg235
or a citation from the ESRC’s Economic and Social Data Service (ESDS):
University of Essex. Institute for Social and Economic Research and National Centre for Social Research, Understanding Society: Wave 1, 2009-2010 and Wave 2, Year 1 (Interim Release), 2010 [computer file]. 3rd Edition. Colchester, Essex: UK Data Archive [distributor], February 2012. SN: 6614, http://dx.doi.org/10.5255/UKDA-SN-6614-3
In short, when citing data include author(s), title, year of deposit, repository or distributor, DOI (the standard persistent digital object identifier), or other access location. Make sure your citation includes enough information to find the data easily.
An exhaustive guide, How to Cite Datasets and Link to Publications from the Digital Curation Centre (DCC) discusses data citation in great detail, with information for researchers and data repositories.
Data archives may provide guidelines on how to cite the data. Some websites provide this information on individual dataset pages. More frequently, the website or database where you found your data will also have information on how to cite that data in their FAQs, 'About' page, or 'How to Use' information.
Other guides include:
- data citation in The Dataverse Project
- a general guide to citing data from DataCite
- Quick Guide to Data Citation from IASSIST,
- Data Citation of Evolving Data by Research Data Alliance
MANTRA, a research data management training course, offers an interactive training module which introduces the concepts of documentation and metadata, including:
- why documenting your research data is important, and why documentation is important for using others’ data,
- why and when to use metadata,
- the importance of citing data, and how to do it.
Copyright and data
Copyright, an intellectual property right assigned automatically to the creator, prevents unauthorised copying and publishing of an original work. Under the Copyright, Designs and Patents Act 1988, copyright applies to research data, which falls under the category of Literary, dramatic and musical works, and plays a role when creating, sharing and re-using data. However, 2014 amendments (see section 29A) and the Government official Guidance Note introduced changes to copyright and intellectual property law and how it affects researchers.
You can find data mining guides and tools on the Future TDM project website.
Additional advice on general obligations of using or sharing data and more specifically on citation of data may be found on the Research Data Oxford website.