Journal of Global Change Data & Discovery2017.1(1):3-11

[PDF] [DATASET]

Citation:Liu,C.,Guo,H.D.,Uhlir,P.,et al.GCdataPR: Infrastructure for Data Publishing & Sharing in/for/with Developing Countries[J]. Journal of Global Change Data & Discovery,2017.1(1):3-11 .DOI: 10.3974/geodp.2017.01.02 .

GCdataPR: Infrastructure for Data Publishing Repository & Sharing in/for/with Developing Countries

Liu, C.1*  Guo, H. D.2  Uhlir, P. F.3  Ge, Q. S.1  Zhou, X.2  Shi, R. X.1  Gong, K.4  Imbuga, M.5  Gu, X. F.2  Odido, M.6  Liao, X. H.1  Chen, J.7  Doko, T.8 
Chen, W. B.8  Hodson, S.9  Minster, J. B.10  Madela-Mntla, E.11  Hasan, N.12
Jiang, D.1  Zhu, Y. Q.1  Wang, C. L.2  Wittenburg, P.13  Chu, W. B.14  Xu, X. L.1 He, S. J.1  Lv, T. T.2  Singh, R. B.15  Tikunov, V.16  Wang, Q.17

1. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences,
Beijing 100101, China;

2. Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China;

3. US National Academy of Sciences, Washington, D.C. 20541, USA;

4. Nankai University, Tianjin 300071, China;

5. Jemo Kenyatta University of Agriculture and Technology, Nairobi P.O. Box 62000-00200, Kenya;

6. UNESCO, Nairobi P.O. Box 72107-00200, Kenya;

7. National Geomatics Center of China, Beijing 100830, China;

8. Keio University, Fujisawa 2520882, Japan;

9. CODATA, Paris 756116, France;

10. University of California at San Diego, La Jolla, California 92093, USA;

11. ICSU African Regional Office, Pretoria 0020, South Africa;

12. ICSU Regional Office for Asia and the Pacific, Kuala Lumpur 50450, Malaysia;

13. Max Planck Computing and Data Facility, Munich 85748, Germany;

14. Group on Earth Observations, Geneva 1211, Switzerland;

15. University of Delhi, Delhi 110007, India;

16. Moscow State University, Moscow 119991, Russia;

17. Satellite Environment Center, Ministry of Environmental Protection, Beijing 100094, China

 

Abstract: Global change research data are valuable resources for science and sustainability. Protecting data authors’ interests and opening access to these data are keys for data sharing in a sustainable way. The Global Change Data Publishing & Repository (GCdataPR), sponsored by the Institute of Geographic Sciences and Natural Resources Research (IGSNRR), Chinese Academy of Sciences (CAS) and the Geographical Society of China (GSC), creates a new mechanism for data, including metadata, datasets (data products) and data papers, publishing on a peer-review basis, and a repository for long-term preservation. Although the data content determines the data value in general, data publishing can add significant value to the data in most cases, such as protecting the data authors’ intellectual property rights (IPRs), securing data quality, preserving and enhancing the data for reuse over the long term, and promoting the data authors’ impact. Doing data work locally and networking globally makes the local content data available on global digital networks through mechanisms such as assigning a Digital Object Identifier (DOI) to the datasets, and using institutions such as the Global Earth Observation System of Systems (GEOSS), the China GEOSS, the Data Citation Index of the Web of Science (DCI), ResearcherID, CODATA, the World Data System (WDS), as well as 36 Chinese earth science journals. The data management policies and infrastructure play a central role in furthering the data intensive sciences and in international cooperation with developing countries.

Keywords: global change research; data publishing; data repository; data infrastructure; data policy; developing countries

1 Introduction

Since the World Summit of Information Society (WSIS)[14] (Geneva, 2003; Tunis, 2005), the Committee on Data for Science and Technology, of the International Council for Science (CODATA/ICSU) approved the establishment of the Task Group on Preservation of and Open Access to S&T Data in/for/with Developing Countries (CODATA PASTD),which was led by Co-chairs of William Anderson (USA) and Liu, Chuang (China) in 2002[5]. A regional and national series of international workshops and training courses were held by the Task Group over 15 years in China, Brazil, Colombia, Cuba, Mongolia, South Africa, Kenya, Ethiopia and India. The topics focused on national strategies and policies for research data (2002-2008), capacity building for data management (2008-2014), and the data infrastructure issues (2014-present).

The International Workshop on Open Data for Science and Sustainability in Developing Countries (OpenDataSSDC) was held on 6-8 August 2014, Nairobi, Kenya, which was led and sponsored by the CODATA PASTD and number of international organizations[6]. The participants agreed on a common understanding of Data Sharing Principles in Developing Countries (The Nairobi Data Sharing Principles)[7] and enhancing the data infrastructure for data publishing and sharing in developing countries, such as the practices of GCdataPR and the World Data Center in South African. The government of Kenya announced that an advanced data center would be established at the Jomo Kenyatta University for Agriculture and Technology (JKUAT).

In December 2015, the GCdataPR and the Editorial Board of Acta Geographica Sinica announced jointly to publish the research paper with original data in both the journal and repository[8]. Following that, the GCdataPR and 35 Chinese journals in earth sciences agreed to publish a paper linking with the supporting dataset in March 2016, and made them openly available to all[9,10].

H.E. Mr. Sam Kahamba Kutesa, the 69th President of the United Nations, invited Liu Chuang to give a speech on the GCdataPR at the United Nations in New York on 2 July 2017[11]. Research data publishing and sharing in developing countries were also discussed at the Science & Technology for Society Forum, in April 2016 in Colombo, Sri Lanka and

at the National Conference of the Geographical Society of India, in November 2016

at Nalanda, India.

In June 2016, Clarivate Analytics[12] informed the GCdataPR editors that it was listed in the database of the Data Citation Index of the Web of Science[13,14]. During the summer of 2016, it was appointed as the National Earth Observation Data Publishing Center[15] by the National Remote Sensing Center of the Ministry of Science and Technology of P. R. China, as well as the Data Provider and Broker of the Global Earth Observation System of Systems

(GEOSS)[16]. Furthermore, the GCdataPR was approved to be the 68th Regular Member of the World Data System of the International Council for Science in October 2016[17].

Framing the topic as acting locally and networking globally, Liu Chuang spoke at the 11th Internet Governance Forum (IGF) in December 2016 to illustrate how the GCdataPR servers are used for the local content publishing and sharing[18]. As one the GCdataPR’s best practices, the Data Quality Action Lines were demonstrated at the First UN World Data Forum in January 2017 in Cape Town, South Africa[19]. The GCdataPR also was selected by the Mi-

nistry of Information and Industry of P. R. China as one of the Top 50 Best Practices and Solutions on Big Data in March 2017. 

According to the statistics kept by the GCdataPR, as of the middle of March 2017, 519 authors from 10 nations published 290 datasets, and more than 390,000 accesses by 21,000 users from 53 nations and 75,000 data files were downloaded for free of charge[20].

2 Data Publishing: a New Mechanism for Opening, Disseminating, and
Sharing Research Data

Since the World Data Center system was established during the International Geophysical Year (IGY, 1957-1958)[21], the data center mechanism plays an important role in research data management and sharing. The US Global Change Research Act[22] indicated that the full and open policy should be the basic principle for US global change research data management and sharing. The US Distributed Active Archive Centers (DAACs)[23] and the China National Information Infrastructure for Science & Technology[2425] are basically following the data center mechanism that has been established. The key elements of this type of data center mechanism are:

(1) government provides the funding and makes the policy to support a sustainable data center(s);

(2) data centers take the responsibility for data collection, long-term archiving, and data services in a full and open basis;

(3) any dataset(s) created by scientists who were financially supported by the government or even private foundations should archive those data into the data centers, which are supported by the government; and

(4) the data should be free or no more than the marginal cost of distribution to the users.

There has been increasing attention after the WSIS, and especially in the data-intensive sciences[25], to confront the challenges of quality and timeliness of data availability, not only in China, but all over the world. CODATA, ISCU, UNESCO, and many international orga-

nizations hosted a series of meetings, in order to improve the data center mechanism[26]. As one of the solutions, the Digital Object Identifier (DOI) Foundation[27] was established in 2005, the DOI became an international standard for scientific literature[28] and now, increa-

singly, for the data themselves Many countries, including China, established the DOI registration services[29]. Other research data initiatives soon followed, including the data publishing interest group of the WDS and the Research Data Alliance (RDA)[30] and the Data Publishing subgroup of CODATA PASTD, Data publishing outlets were also created, including, for example, the Journal of Scientific Data published by Springer Nature[31], the China Western Data Center[32], the GcdataPR, Geodata[33]??and Chinese Scientific Data[34]. All of these actions indicate that new mechanisms of open dissemination and sharing of research data are now being created and portend a new era of the data publishing and sharing.

3 Data Publishing: Metadata, Dataset, and Data Paper Publishing and
Repository Coordinations

Scientific communities have taken actions in data publishing, including the metadata, datasets (products) and data papers, using the following modes:

(1) Paper attachment mode[35]

Some journals, such as Journal of Environment Science and Technology, request authors to publish their papers together with the related datasets. The dataset can be published only because the discovery was made through the dataset.

(2) Datasets (data products) recording with DOI mode[36]

Millions of datasets (data products) were recorded with a DOI all over the world during the last five years. Normally, the data centers take on the responsibility to do so, and they focus their attention on the data products instead of data papers.

(3) The mode of data publishing and repository for both datasets and data papers

In order to assure data quality, it is important to recognize that the datasets and the coordinated data papers are of equal importance for the users of research outputs. It is therefore essential to coordinate the publishing of the data papers with the repository of the datasets. The journal of Scientific Data (SD) and the GCdataPR have used such a methodology since 2014. The difference between them has been that the SD has published data descriptors (papers) and coordinated the archiving of datasets in a group of more than 90 distributed data repositories[37], whereas the GCdataPR has consisted of the Journal of Global Change Data & Discovery for publishing data papers and a repository for publishing the datasets together.

4 Value-added Services of Data Publishing and the Related Repository

4.1 Value of Data Publishing

Although the data content determines the value of the data, in general, data publishing can greatly add value to the data as well in most cases. The role of data publishing for adding value in China can be summarized as follows:

(1) Protect the data author’s intellectual property rights for sharing the data

The first function of data publishing is protecting data author’s intellectual property rights[38]. It uses the IPRs to enforce a common-use license for sharing the data with the users. The GCdataPR uses DOI: 10.3974 and assigns each dataset and coordinated data paper an individual identifier. This helps promote the GCdataPR data sharing[39] and citation policy for all users to provide appropriate credit to the data authors. The DOIs also play a critical role for protecting the IPRs in the data so that they can be shared.

(2) Assure data quality

According to the GCdataPR, all of the data, including metadata, datasets and data papers, should follow the coordinated peer-review procedure before they are published. In many cases, multiple feedbacks and editing for the dataset and data paper are necessary.

(3) Preserve the data for the long-term

In many cases, datasets get lost for some reason, including the updating of a computer, a change in the software, the graduation of students or retirement of senior researchers, and so on. The data publisher, such as GCdataPR, together with the digital repository, will play a long-term preservation role to secure the dataset and make it actively available for access and use.

(4) Enhance the data and data authors’ impact

It is the data publisher’s major contribution to disseminate the data on behalf of the authors. For example, the GCdataPR disseminates the data worldwide, not only by itself, but by linking to other relevant systems though the metadata to enhance the impact of the data and the data authors’ impact.

4.2 Process of Data Publishing

There is a series of procedures that the GCdataPR follows in implementing the data publishing and repository. The first step is for the author submitting the data for publishing to agree with the GCdataPR’s data publishing policies. The procedure then includes the assessment of the data by peer review, including coordination of the data records and data paper; assessing the data formats and the nature of the files; the methodology of dataset development, including data visualization, data archiving, and registration in DOI system; putting the data online and providing associated services; tracking data statistics; and providing other related functions. A flowchart of these data publishing processes is shown in the Figure 1 blow.

Figure 1  Flowchart of Global Change Research Data Publishing & Repository

5 Data Dissemination and Sharing Policies

5.1 Channels of Data Dissemination

The GCdataPR (http://www.geodoi.ac.cn) is an independent multiple functional system for data publishing, repository, and sharing. Acting locally, networking globally, is the basic concept of the GCdataPR data management. The channels of the data dissemination as of March 2017 include the following (Figure 2):

(1) Internet of IDs: Networking to DOI[26]

As noted above, the GCdataPR takes responsibility to assign each dataset ID with the DOI: 10.3974/. Based on the agreement between DOI and IGSNRR/CAS, an inter-opera­tional system was established between the two servers: as soon as the data has passed the peer-review procedure in GCdataPR, the dataset DOI ID is automatically established and can be searchable in the DOI system anywhere in the world.

(2) Internet of datasets: Networking to the Data Citation Index of Clarivate Analytics[22]

The Data Citation Index (DCI) on the Web of Science is one of powerful systems deve-

loped by the Clarivate Analytics. It provides a single point of access to quality research data from repositories across disciplines and around the world[26]. GCdataPR was recognized by the Clarivate Analytics as one of the qualified repositories in June 2016. The powerful function of DCI will help GCdataPR to disseminate data through the internet.

(3) Internet of authors: Networking to ResearcherID

Clarivate Analytics provides a ResearcherID system for all authors. Any one of the authors can get a ResearcherID after he/she registers in the system, thereby promoting the collection of the author’s datasets and publications, as well as the citations.

(4) Internet of journals: Networking to 36 earth science journals

By March 2017, there were 36 earth science journals linking to the GCdataPR. They are Acta Geographica Sinica; Journal of Geographical Sciences; Journal of Natural Resources; Geo-Information Science; Resources Science; Geographical Research; Progress in Geophysics; Research on Heritages and Preservation; Journal of Resources and Ecology; Progress in Geography; Acta Meteorologica Sinica; Journal of Palaeogeography (Chinese Edition); Chinese Journal of Geophysics; Chinese Journal of Plant Ecology; Chinese Journal of Atmospheric Sciences; Journal of Meteorological Research; Chinese Geographical Science; Scientia Geographica Sinica; Wetland Science; Mountain Research; Journal of Mountain Science; Journal of Arid Land Resources and Environment; Journal of Meteorology and Environment; Chinese Journal of Polar Research; Advances in Polar Science; Tropical Geography; Journal of Lake Sciences; Arid Land Geography; Arid Zone Research; Journal of Arid Land; Remote Sensing Information; Acta Ecologica Sinica (Chinese Version); Acta Ecologica Sinica (International Journal); Ecosystem Health and Sustainability; Journal of Jilin University (Earth Science Edition) and Journal of Global Change Data & Discovery.

(5) Internet of professional networks: Networking to GEOSS

In October 2016, GCdataPR was assigned to be China’s earth observation data publishing center by the National Remote Sensing Center of China, of the Ministry of Science and Technology of P. R. China. It is also one of data providers and brokers of GEOSS (Global Earth Observation System of Systems) and a partner of AOGEOSS (Asia-Oceanic GEOSS). The inter-server operation between GCdataPR and the GEOSS Portal connects the two systems of data.

(6) Networking to the World Data System (WDS/ICSU)

At the time of this writing, there were 58 regular members in the World Data System of the International Council of Sciences and the GCdataPR has been one of them since October 2016. The WDS/ICSU is one of the important channels for GCdataPR to disseminate its data.

The GCdataPR is not only open with its data on its official website, but networks the data to the value-added services of the world by internet of IPs, internet of datasets, internet of authors, internet of journals, and the internet of professional applications. As an example, the boundary data of Qinghai-Tibet Plateau[40], developed by Professor Zhang, Y. L., et al. is taken to explain how Acting Locally, Networking Globally works.

Acting locally, Networking globally:

Figure 2  Chart of GCdataPR networking

5.2 Data Sharing Policies

Data from the Global Change Research Data Publishing & Repository includes metadata, datasets (data products), and publications (in this case, in the Journal of Global Change Data & Discovery). Based on the data sharing principles of WDS/ICSU, GEOSS, the Nairobi Data Sharing Principles, and the China Big Data Strategy, the GCdataPR data sharing

policy is:

(1) Data are openly available and can be freely downloaded via the internet;

(2) Users are encouraged to use data subject to citation;

(3) Users, who are by definition also value-added service providers, are welcome to redistribute data subject to written permission from the GCdataPR Editorial Office and the issuance of a data redistribution license; and

(4) If the accessed data are used to compile new datasets, the ‘ten per cent principle’ should be followed, so that the data records utilized should not surpass 10% of the new dataset contents, while sources should be clearly noted in suitable places in the new dataset.

5.3 Cost Policy

The GCdataPR is operated on an open access model, and the costs for covering the data publishing, repository, and sharing is as follows:

(1) IGSNRR/CAS and GSC will cover the costs of the editorial office, communications, data repository, facilities, and related equipment;

(2) the authors publish metadata, datasets (data products) at the GCdataPR for free and data papers in the Journal of Global Change Data & Discovery for a charge;

(3) independent author(s) from the at least economically developed countries and young scientists, under 35 years of age, from developing countries are all for free to publish their data (metadata, datasets, data papers); and

(4) the GCdataPR may get grants from public or private sectors through the IGSNRR/ CAS and Geographical Society of China.

6 Conclusion

Global environmental change is a key characteristic of earth science and society. Global change research data can be reused for many different applications. There is an urgent need for implementing the infrastructure for data publishing and data repositories in developing countries. The GCdataPR provides one approach to such a fundamental infrastructure. The new mechanism for metadata, dataset and data paper publishing, and sharing will greatly help scientists in having quality and timely data available for all[41]. The new technology and methodology will support the infrastructure for networking local content more efficiently. The GCdataPR’s practical data management policies could be an example and provide best practices for making research data open in the world, and especially in developing countries.

References

[1]       United Nations. Geneva Declaration of Principles, Building the Information Society: a global challenge in the new Millennium [OL]. 2003. http://www.itu.int/net/wsis/docs/geneva/official/dop.html.

[2]       United Nations. Geneva Plan of Action [OL]. 2003. http://www.itu.int/net/wsis/docs/geneva/official/ poa.html.

[3]       United Nations. Tunis Commitment [OL]. 2005. http://www.itu.int/net/wsis/docs2/tunis/off/7.html.

[4]       United Nations. Tunis Agenda for the Information Society [OL]. 2005. http://www.itu.int/net/wsis/ docs2/ tunis/off/6rev1.html.

[5]       CODATA PASTD TG [OL]. http://www.codata.org/task-groups/preservation-of-and-access-to-scientific- and-technical-data-in-for-with-developing-countries-pastd.

[6]       Workshop on Open Data for Science and Sustainability in Developing Countries [OL]. http://www.codata. org/events/workshops/workshops-2014/open-data-in-developing-countries-nairobi.

[7]       CODATA-PASTD. International Workshop on Open Data for Science and Sustainability in Developing Countries [OL]. http://www.codata-pastd.org/Scientific.html.

[8]       Editorial Office of Acta Geographica Sinica and GCdataPR. Declaration on Joint efforts of global change research paper and data publishing [OL]. http://www.geodoi.ac.cn/WebCn/NewsInfo.aspx? ID=39.

[9]       Liu, C., He, S. J. Joint Agreement on Global Change and Earth Science Journals and Data Publishing [OL]. http://www.geodoi.ac.cn/WebCn/NewsInfo.aspx?ID=40.

[10]    Liu, C., He, S. J., Feng, Y. W., et al. Claim Scientific Discovery with both Paper & Data [OL]. http://www.

geodoi.ac.cn/WebEn/NewsInfo.aspx?ID=42.

[11]    Liu, C. Enhancing the Joint Efforts on Data Sharing in Developing Countries, http://www.geodoi.ac.cn/

WebEn/NewsInfo.aspx?ID=31.

[12]    Clarivate Analytics. http://clarivate.com/.

[13]    Web of Science [OL]. http://clarivate.com/scientific-and-academic-research/research-discovery/web-of- science/.

[14]    Clarivate Analytics [OL]. DCI?DData Citation Index [OL]. http://wokinfo.com/products_tools/multidisciplin­ary/dci/.

[15]    GCdataPR. The Letter of Interest on China GEOSS Data Publishing Center was signed between the

National Remote Sensing Center, Ministry of Science and Technology of P. R. China (NRSCC) and Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences.

[16]    GEOSS Portal [OL]. http://www.geoportal.org/.

[17]    WDS Regular Members. https://www.icsu-wds.org/community/membership/regular-members.

[18]    IGF 2016: Enabling Inclusive and Sustainable Growth. http://www.intgovforum.org/multilingual/content/

igf-2016.

[19]    The 1st UN World Data Forum [OL]. http://undataforum.org/.

[20]    GCdataPR [OL]. http://www.geodoi.ac.cn.

[21]    The International Geophysical Year [OL]. http://www.nas.edu/history/igy/.

[22]    Global Change Research Act of 1990, Public Law 101-606(11/16/90) 104 Stat. 3096-3104, 1990 [OL].

http://www.globalchange.gov/.

[23]    NASA. GES DISC: Goddard Earth Sciences Data and Information Center [OL]. https://disc. gsfc.nasa.gov/.

[24]    National Earth System Sciences Data Sharing Platform [OL]. http://www.geodata.cn/.

[25]    Guo, H. D., Wang, L. Z., Chen, F., et al. Scientific big data and digital earth. Chinese Science Bulletin,

2014, 59(12):1047-1054.

[26]    Ad-hoc Strategic Coordinating Committee on Information and Data. Interim Report to the ICSU Committee on Scientific Planning and Review [M]. ISBN 978-0-930357-85-6, 2011.

[27]    International DOI Foundation (IDF) [OL]. http://www.doi.org/.

[28]    ISO 26324:2012: Information and documentation?DDigital object identifier system [OL]. https://www.

iso.org/standard/43506.html.

[29]    Institute of Scientific and Technical Information of China (ISTIC). DOI China. [OL]. http://www.

doi.org.cn/portal/index.htm.

[30]    RDA/WDS Publishing Data IG [OL]. https://www.rd-alliance.org/groups/rdawds-publishing-data-ig.html.

[31]    Scientific Data. http://www.nature.com/sdata/.

[32]    Data Center of Cold and Dry region [OL]. http://westdc.westgis.ac.cn/.

[33]    GeoScience Data Journal [OL]. http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060/homepage

/ForAuthors.html.

[34]    Journal of China Scientific Data [OL]. http://csdata.org/p/.

[35]    Journal of Environment Science and Technology [OL]. http://pubs.acs.org/loi/esthag.

[36]    Global Biodiversity Information Facility (GBIF) [OL]. http://www.gbif.org/.

[37]    Scientific Data Recommended Repositories [OL]. http://www.nature.com/sdata/policies/repositories.

[38]    Copy Right Law of P. R. China. http://www.gov.cn/flfg/2010-02/26/content_1544458.htm.

[39]    GCdataPR Editorial Office. GCdataPR data sharing policy [OL]. DOI: 10.3974/dp.policy.2014.05 (Updated 2017).

[40]    Zhang, Y. L., Li, B. Y., Zheng, D. Datasets of the boundary and area of the Tibetan Plateau [DB/OL]. Global Change Data Publishing & Repository, 2014. DOI:10.3974/geodb.2014.01.12.V1.

[41]    Liu, C. Global Change Research Data Publishing & Repository [J]. Acta Geographica Sinica, 2014, 69

(sup): 3-11.

Co-Sponsors
Superintend