How to Cite Data

1. What is a data reference?

Data citation is a new concept proposed by the publishing and data sharing circles at home and abroad. It refers to the use of data as an important citation material. Like the cited literature, it is cited and explained in the reference part when writing an article. The reference format of the data is generally provided by the data center.



2. Why do you want to make a data reference?


1) Intellectual property that reflects data

Data is both the result of manual labor and the crystallization of intellectual activity, so it has intellectual property rights. Data sharing needs to protect the intellectual property of data, in order to protect the rights of data producers and make data sharing sustainable. The core intellectual property related to the data includes the right of authorship, distribution and recompilation, of which the right of authorship is the most basic right. In academia, the traditional literature citation is the best way to reflect the right of authorship. It has been widely recognized by scientists, so it can be used to reflect the intellectual property of data through data citation.


2) Data usage tracking, resource interconnection and data evaluation

By means of data references (or annotations), it is possible to obtain which documents use the data by means of a search. The data center aggregates and feeds this information back to the data provider, protecting the data provider's most basic right to know.


Through the reference documents, the interconnection between documents can be realized, and more knowledge mining and discovery can be realized. By citing data, it is possible to establish an interconnection between data and literature, and to carry out greater knowledge discovery and mining.


The current scientific research evaluation system lacks the evaluation of the results of the data, so many data owners do not have the motivation to participate in data sharing. By citing or labeling the data, the data and the literature can be linked together, and the impact indicators (such as SCI) for the scientific data can be established by the number of times the data is used (referenced), thereby more accurately evaluating the contribution of the data worker, fundamentally Incentive data providers participate in data sharing.



3. How to make data references

The data center provides two alternative suggested reference formats, including a reference citation format and a body declaration format.


1) Reference form: Recommended reference format

If the journal supports literature references to data, please use this approach:


The reference format includes five basic reference elements. Data users can edit according to the format requirements of the journal. Users are advised to use this method for reference.


"Data author list. Data title. Data publishing/publishing unit. Data publishing/release time. Data DOI permanent address."


such as:

Qi Youhua, Li Xin. Chinese Vegetation Functional Map (1 km). Cold and Arid Region Science Data Center, 2011. doi:10.3972/westdc.001.2013.db [Ran Youhua, Li Xin. Plant functional types map in China. Cold and Arid Regions Science Data Center at Lanzhou, 2011. doi:10.3972/westdc.001.2013.db]


Another example:


Li Xin, Ma Mingguo, Wang Weizhen, Huang Guanghui, Zhang Zhihui, Tan Junlei. Heihe Integrated Remote Sensing Joint Experiment: Data Collection of Eddy Related Flux in Dayekou Guantan Forest Station. Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences. 2008. doi: 10.3972/water973.0294.db [Li Xin, Ma Mingguo, Wang Weizhen, Huang Guanghui, Zhang Zhihui, Tan Junlei. WATER: Dataset of eddy covariance observations at the Dayekou Guantan forest station. Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences. 2008. doi:10.3972/water973.0294.db]


2) Text annotation form

If the journal does not support the reference format, the data center recommends a text annotation format. The text annotation form refers to the data used in the text, and is attached to the data used in the form of a DOI label.

As in the body of the article:


. . . A 1:100,000 desert (sand) distribution dataset (doi:10.3972/westdc.0294.db) was used. . .


Another example:


...Plant functional types map in China (doi:10.3972/westdc.001.2013.db) ...


After the body is marked, please indicate the source of the data in the thank you.


4. Various references provided by the data center


1) Direct reference to data

Data centers provide two ways to reference data, that is, direct references to data and direct references to documents. Direct citations of the literature are usually determined by the data authors according to their own wishes, and they provide direct references to 1-2 articles. For example, China's vegetation functional map (1 km) data.


2) Suggested references for data

  Often, data centers also provide literature that is closely related to the data. These documents are generally provided by data authors and may be related to the research background, preparation process, processing methods, and quality evaluation of the data. These articles are also recommended for data users when they use this data.


3) Data user documentation

  The documents published by the user are fed back to the data center by the user or collected by the data, and are also attached to the data introduction page. Users can also refer to other users' usage of the data and feedback instructions.




5. Data reference other instructions

1) Data reference object. The reference objects of the data include the data description document, the related literature and the data itself, and there is no unified standard at home and abroad. In this case, the preferred reference method for which the data is to be executed is selected, and the Cold and Arid Science Data Center first considers the user's suggestion.


2) Because data references are a birth thing, many journals or editorial departments do not agree to data references in the reference section. In this case, in order to be able to accurately report the application of the data, the data center suggests that the user prefer to label and explain the data source in the body.


3) The data in the main text is in various forms, which are explained in the data source, and also in the thank you section. From the point of view of tracking, no matter where the data description is placed, as long as the “search term” is accurate, it can be achieved through full-text search. However, in many cases, the data user does not follow the suggested format. Once the "search term" changes, the search result will be inaccurate. In this case, the Cold and Arid Science and Technology Data Center uses DOI to accurately track data references. That is, the DOI is registered for each data, and then the user is required to use the DOI code in the text description. No matter how the text content changes, the DOI code can be traced to the document.