Journal of Global Change Data & Discovery2020.4(1):95-101

[PDF] [DATASET]

Citation:Shi, R. X. Ma, J. H. Liu, C., et al.Statistics and Analysis of the Global Change Research Data Publishing & Sharing (2019)[J]. Journal of Global Change Data & Discovery,2020.4(1):95-101 .DOI: 10.3974/geodp.2020.01.16 .

DOI: 10

Statistics and Analysis of the Global Change
Research Data Publishing & Sharing (2019)

Shi, R. X.  Ma, J. H.  Liu, C.*  Zhang, Y. H.  Wang, Z. X.  Shen, Y.

Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

 

 

Abstract: The achievements of the Global Change Research Data Publishing & Repository (GCdataPR) are summarized in this paper with respect to each of the datasets, data authors, data-related discovery papers and data sharing. Results showed that 162 datasets covering 6 issues were published in GCdataPR in 2018, including 13,866 data files (428 compressed files). The online data size is 36.41 GB (71.72 GB compressed). Regarding the areas covered by the dataset, 5 datasets are on a global scale, 2 are trans-continental and 138 (85.19% of the total) are Asian. A total of 202 dataset authors are newly included in GCdataPR in 2019, mainly from China. Among the published datasets, 35 are directly related to data papers, and 46 research or discovery papers are directly related to datasets. There are total of 5,205 new computer IP users, 76,510 times of downloads with data size of 553.43 GB in 2019. The GCdataPR is playing an increasingly important role in data sharing.

Keywords: global change; data publishing; annual summary; achievement analysis; 2019

1 Introduction

In 2019, many achievements and progresses were taken on scientific data management, sharing, and policy by Chinese Academy of Sciences and national authorities in China. On February 11, 2019, the Chinese Academy of Sciences issued the “Measures for Scientific Data Management and Open Sharing of the Chinese Academy of Sciences (Trial)” [1]. This is an important measure to implement the national big data strategy and the “Scientific Data Management Measures”. On June 5, Ministry of Science and Technology and Ministry of Finance of P. R. China jointly announced the list of 20 national scientific data centers (Guokefaji 2019 (194) [2]) The GCdataPR has become the data publishing sub-center of the National Earth Observation Scientific Data Center. On September 26, the Chinese delegation officially released the "Earth Data Support Sustainability Report" at the 74th UN General Assembly[3]. At the Fourth Plenary Session of the Nineteenth Central Committee of the Communist Party of China held on November 1, it was put forward publicly for the first time that data can be used as production factors to participate in distribution according to their contributions, which clarified the theoretical basis for scientific data work in the new era[4]. On November 8, the International Scientific Council Data Committee (CODATA) officially released the “The Beijing Declaration on Research Data” on its website[5]. The Declaration affirmed the data policies and implementation progress that have been issued around the world, and on this basis, clarifies the core principles of advancing multilateral cooperation in related fields.

As of December 31, 2019, GCdataPR, as an Regular Member of the World Data System (WDS) of International Science Council (ISC) and National Earth Observation Data Publishing Center of China[6], has published 673 datasets on 31 issues developed by 1047 authors from 12 countries (international organizations), with a data size of 1.12 TB (258.82 GB after compression). Since August 1, 2019, the GCdataPR has been recognized by the American Geophysical Society as a storage center for original data associated with academic journals of the society[7]. To generate publicity, ensure transparency, and make the academic community clearly understand the progress of data publishing and sharing, according to items 68 and 69 of “Guidelines of Global Change Research Data Publishing & Repository[8], in September 2019, the Geographical Big Data Working Committee of Geographical Society of China held its 2019 annual conference in Dalian[9]. The theme of this conference is “Geographic Big Data Supports Sustainable Development Goals.” At the conference, the Chinese Geographical Society released the “2019 Global Change and Earth Science Dataset Impact Ranking” [10]. This ranking is different from previous editions: the range of participating publishing systems has expanded, not limited to the GCdataPR. As the items, the impact scores rankings in four partitions for 2014-2018 datasets, data authors and data author institutes were added besides institution ranking, foundation ranking, research paper publishing journals ranking, browsed dataset ranking, downloaded dataset ranking.

This paper summarizes the data publishing work in 2019 from the perspective of datasets, dataset authors, dataset foundations, dataset related papers and data sharing.

2 Statistics and Analysis of Published Datasets

2.1 Published Datasets

A total of 162 datasets were published in 6 issues in 2019 (Table 1), 6 datasets less than those published in 2018[11]. In 2019, there were 13,866 data files published in total, compressed into 428 data file packages. The compressed data file packaging rate was 32.40, and the total data size was 71.72 GB (36.41 GB after compression), with a data compressing rate of 1.97.

2.2 Geographical Regions Covered by the Datasets

Five datasets are on a global scale, accounting for 3.09% of the total (162 datasets), and two datasets are trans-continental, accounting for 1.23% of the total (Table 2). Datasets covering Asia constituted the greatest proportion (138 datasets), accounting for 85.19% of the total. Among them, 76 datasets covered China, accounting for 55.07% of the Asian datasets and 46.91% of the total published in 2019. 8 datasets covered Oceania, accounting for 4.94% of the total. The number of datasets covering Polar Regions, Europe, Africa and North America were 3, 2, 2 and 1, respectively. In addition, there was 1 dataset involving philatelic culture.

 

Table 1  Statistics of datasets published and archived in GCdataPR in 2019

Year/Month

Number of issues

Number of
datasets

Number of
data files

Number of compressed data packages

Data Size (GB)

Compressed Data Size (GB)

2019.01

1

20

1261

37

0.05

0.02

2019.02

2

60

914

119

0.19

0.02

2019.03-04

3

22

1891

40

29.24

2.29

2019.05-08

4

20

4697

162

33.65

30.50

2019.09-11

5

20

1314

46

7.87

3.09

2019.11-12

6

20

3789

24

0.72

0.49

Total in 2019

6

162

13866

428

71.72

36.41

Total during 2014-2019

31

673

411808

1931

1142.34

258.82

 

2.3 Published Datasets by Discipline

Table 2  Statistics of geographical regions covered by published datasets

Covering region

Number of dataset

Percentage (%)

Global

5

3.09

Trans-continental

2

1.23

Asia

138

85.19

Europe

2

1.23

North America

1

0.62

Oceania

8

4.94

Africa

2

1.23

Polar regions

3

1.85

Other (Culture)

1

0.62

Total

162

100.00

 

Table 3  Statistics of domain of published dataset

 

Discipline

Number of datasets

Percentage (%)

Terrestrial

Water

18

11.11

Land

9

5.56

Ecology/Biology

15

9.26

Atmosphere

11

6.79

Geology and Geophysics

4

2.47

Humanity/Econo­mics

16

9.88

Oceanic

Ocean (including Ocean/Coastal zone/Islands)

87

53.70

Others

Culture

2

1.23

Total

 

162

100.00

 

The datasets published in GCdataPR covered a wide range of disciplines, including geography, resources, ecology, environment, atmosphere, ocean, land, plants, water, social economy, culture, art, and history (Table 3). As shown in Table 3, there were 73 datasets about terrestrial regions

(45.06%); 87 datasets about oceans, including deep sea areas, shallow sea areas, polar regions, coastal areas, and islands (53.70%); and 2 datasets about culture and art (1.23%).

Among the terrestrial datasets (73), there were 18 datasets about water (rivers, lakes, and wetlands), accounting for 11.11% of the total, 9 datasets about land (including land cover and land use), accounting for 5.56% of the total, 15 datasets in ecology and biology, accounting for 9.26% of the total, 11 datasets in atmosphere (including weather and climate), accounting for 6.79%, 4 in geology and geophysics (2.47%), and 16 datasets in humanity and economics (9.88%).

2.4 Data Levels

Table 4  Summary of dataset in production level

Data product level

Datasets

Percentage (%)

2

141

87.04

3

 16

9.88

4

  5

3.09

 

Table 5  Statistics of author teams and their dataset

Number of
authors in

Number of authors in

Number of
authors in

1

4

2.47

2-5

141

87.04

6

17

10.49

 

Table 6  Statistics of dataset author affiliations

Organization

Number of Datasets

Organization

Number of Datasets

Chinese Academy of Sciences

118

China Association for Science and Technology

1

Ministry of Natural Resources of P.

R. China

 75

Ministry of Agriculture and Rural Affairs of P. R. China

1

Ministry of Education of P. R. China

52

 

 

China Meteorological Administration

6

Total

259

Province

3

Data published

162

Ministry of Water Resources of P. R. China

2

Dataset developed by cross-depar­tment

 93

Ministry of Scien­ce and Technology of P. R. China

1

Percentage

57.41%

 

All datasets were archived into levels 0-5, each according to its stage in the developmental procedures[7]. The specifications were shown in the references[7].

Based on the above criteria, 162 data­sets published in 2019 were categorized into three production levels (Table 4): 87.04% of the datasets were in level 2, 9.88% of them were in level 3, and only 3.09% of the datasets were in level 4.

3 Dataset Author(s)

3.1 Dataset Author(s)

As of December 31 2019, there were 1047 dataset authors and 470 affiliations, 202 authors and 85 affiliations more com­pa­red to that at the end of 2018.

3.2 Dataset Author Groups

Among the 162 datasets, only 4 datasets (2.47%) were developed by a single author, 141 datasets (87.04%) by a team of 2-5 persons, and 17 datasets (10.49%) by a team of more than 6 persons (Table 5).

3.3 Statistics of Chinese Authors by Affiliation and Region

3.3.1 Datasets Authors by Affiliation (Institutes or Universities)

Table 6 presents the affiliations of dataset authors. Most (72.84%) of author affiliations are from the Chinese Academy of Sciences. The second (46.3%) is from the Ministry of natural resources, China. The third (32.1%) is from the Ministry of Education, China. In 2019, 93 datasets (57.41%) were done by cross-departmental cooperation.

3.3.2 Chinese Authors by Region (Province, Municipality, Autonomous Region)

The distribution of dataset authors from China is shown in Table 7. Authors from Beijing constituted the highest proportion, publishing 115 datasets, accounting for 70.99% of the total datasets. Authors from Qinghai published 15 datasets. Authors from most provinces published less than 10 datasets. A total of 28 datasets were developed by trans-province authors, accounting for 17.28% of the total.

Comparing the historical data of Chinese author affiliations that published datasets[12–13], the authors from Beijing have the largest number of published datasets. In addition to more scientific research institutions in the global change research data in Beijing, it also shows the scientists in Beijing pay more attention to data publishing. As of the end of 2019, except Hong Kong, Macao and Taiwan, authors from 31 provinces has published dataset (s) in China.

 

Table 7  Statistics of Chinese authors by region

Province

Number of datasets

Province

Number of datasets

Province

Number of datasets

Province

Number of datasets

Beijing

115

Shandong

3

Zhejiang

2

Hainan

1

Qinghai

 15

Shaanxi

3

Guangxi

1

Anhui

1

Jiangsu

 8

Sichuan

3

Hunan

1

Tianjin

1

Gansu

 7

Shanxi

3

Ningxia

1

Fujian

1

Shanghai

 5

Guizhou

2

 

 

Guangdong

 5

Henan

2

Total

198

Jilin

 4

Xizang

2

Dataset published

162

Hubei

 4

Jiangxi

2

Dataset developed by trans-provinces

 28

Liaoning

 4

Yunnan

2

Percentage

17.28%

 

3.4 Statistics of Datasets by Founding Agencies

Table 8  Statistics of foundation supporting the dataset

Foundation

Number of Datasets

Percentage (%)

No fund

23

14.20

One fund

96

59.26

More than one fund

43

26.54

Total

162

100

 

Most datasets were developed with foun­d­­ations (Table 8), accounting for 85.80% of the total. Notably, 14.20% of the datasets were developed by self-su­pport, 59.26% of the datasets were funded by one funding project, and 26.54% of the da­t­a­sets were funded by two or more fu­nd­ing projects, which usually had a large amount of data, broad coverage, and long time-series.

There were 222 funding projects (sub-projects) in the 162 published datasets. Among them, 102 projects (45.95%) from the Chinese Academy of Sciences, 55 projects (24.77%) were from the Natural Science Foundation of China, 29 projects (13.06%) from the Ministry of Science and Technology of P. R. China, and 22 projects (9.91%) from provincial or company support (Table 9).

 

Table 9  Statistics of funding projects supporting datasets development and publishing

Foundations

Number of
funding projects

Percentage
(%)

Foundations

Number of
funding projects

Percentage (%)

Chinese Academy of Sciences

102

45.95

National Social Science Fund of P. R. China

2

0.90

National Natural Science Foundation of China

55

24.77

China Meteorological Administration

1

0.45

Ministry of Science and Technology

29

13.06

Ministry of civil Affairs of P. R. China

1

0.45

Province/Company

22

9.91

Outside of China

1

0.45

Ministry of Education of P. R. China

3

1.35

Others

3

1.35

Ministry of Natural Resources of P. R. China

3

1.35

Total

222

100

 

With reference to the data over the years[11–14], the datasets supported by Chinese Academy of Sciences, National Natural Science Foundation of China, and Ministry of Science and Technology of P. R. China accounted for more than 65% of the funded datasets, which indicates that the datasets generated by national research projects are the main force for data publishing and sharing.

4 Association of Datasets, Research or Discovery Papers, and Data Papers

There are two kinds of papers associated with a dataset: a data paper and a research or discovery paper. In 2019, there were 81 research papers associated with the published datasets, including 36 data papers, 46 discovery papers or reports. In the Journal of Global Change Data & Discovery, there were several columns, such as data paper, review, new data technology, data impact, data policy and strategy, EU-China cooperation, data encyclopedia, and reports on outreach. In 2019, 62 papers were published in total, including 36 data papers, 4 reviews, 1 paper on data technology, 1 on data impact scores, 2 on data policy and scientific plan, 2 on EU-China cooperation, 11 on global change data encyclopedia, and 4 on academic activities, and 1 paper on characters introduction.

5 Statistics of Data Sharing

From 2014 to 2019, there were 46,752 IP users from 97 countries, territories or areas. In 2019, 5205 IP users were new added, and the new users were from 23 countries, territories or areas. More than 3.59 million users visited the website of the GCdataPR from 2014 to 2019 (Table 10). More than 2.25 million users visited in 2019, and the number is about five times that of 2018. From 2014 to 2019, the total number of data downloads was nearly 2.2 million (using 0:00 Beijing Time as a baseline, multiple downloads of the same data file within 24 hours by the same IP address were recorded as one download)). The number of data downloads in 2019 is more than 70000, which is about 4.5 times of that in 2018. From 2014 to 2019, the download data size (after compression) was more than 3.97 TB. In 2019, the download data size was 553.43 GB. It is obvious that the contribution of GCdataPR to the research data sharing increased year by year.

 

Table 10  Statistics of data sharing through the GCdataPR in 2018 and 2019*

Year

Visitors

Accum. visitors

New data users (IP)

Accum. data users (IP)

Data files downloaded

Accum. data files downloaded

Data size downloaded (GB)

Accum. data size downloaded (GB)

2018

454,976

1,335,794

4,750

41,547

17,147

143,055

836.87

3,512.57

2019

2,256,527

3,592,321

5,205

46,752

76,510

219,565

553.43

4,066.00

*Data in 2018 is from reference [11].

6 Discussion and Conclusion

In summary, GCdataPR is steadily moving forward in 2019, and playing an increasingly important role in the publishing and sharing of scientific data. Whether it's the number of data users, the number of dataset visits and downloads, the number of data authors, or the number of users' countries, they are increasing year by year. In addition, the number of journals of scientific papers related to dataset is increasing. Since August 2019, the journals sponsored by the American Geophysical Society have required authors to submit their contributions and store the data set in the designated data center and repository. GCdataPR is honored to be one of the designated data repositories.

However, there are many problems to be discussed and solved in the practice of data publishing. For example, some researchers are ambiguous on the concept of data intellectual property, some basic data has quality problems, some authors have low enthusiasm for contribution, etc. It is hoped that the authorities can strengthen the publicity and management, incorporate the dataset into the evaluation system and performance system of researchers, so that they can get the same achievements as published scientific papers. It is hoped that relevant policies and mechanisms can be put forward to promote researchers to do dataset in a down-to-earth way and make high-quality data in order to provide basic and important support for scientific research in global change and economic construction.

 

References

[1]       Scientific Data Management and Open Sharing Measures of the Chinese Academy of Sciences (Trial). http://www.cas.cn/tz/201902/ t20190220_4679797.shtml.

[2]       Notice on Issuing the List of Optimization and Adjustment of the National Science and Technology Resource Sharing Platform by the Ministry of Science and Technology and the Ministry of Finance, P. R. China. http://www.most.gov.cn/mostinfo/xinxifenlei/fgzc/gfxwj/gfxwj2019/201906/t20190610_147031.htm.

[3]       http://www.aircas.cas.cn/dtxw/rdxw/201909/t20190927_5402026.html.

[4]       https://china.huanqiu.com/article/9CaKrnKnC4J.

[5]       The Beijing Declaration on Research Data. https://codata.org/news/361/62/The-Beijing-Declaration-on- Research-Data

[6]       Liu, C., Guo, H.D., Uhlir, P., et al. GCdataPR: Infrastructure for data publishing & sharing in/for/with developing countries[J]. Journal of Global Change Data & Discovery, 2017. 1(1): 3-11. DOI: 10.3974/geodp. 2017.01.02.

[7]       Ma, J. H., Duan, Z. Q., Liu, C. GCdataPR identified as the trusted repository by American Geophysical Union to deposit the original data from research paper[J]. Journal of Global Change Data & Discovery, 2019. 3(3): 305-307. DOI: 10.3974/geodp.2019.03.13. .

[8]       Editorial Office of Journal of Global Change Data & Discovery. Guidelines of Global Change Research Data Publishing & Repository[J]. Journal of Global Change Data & Discovery, 2017. 1(3): 253-261. DOI: 10.3974/geodp.2017.03.01.  

[9]       Zhang, W., Shen, Y. Geographic data for sciences and sustainability - summary of 2019 conference of Geographic Big Data Working Committee of Geographical Society of China[J]. Journal of Global Change Data & Discovery, 2019. 3(3): 308-310. DOI: 10.3974/geodp.2019.03.14.

[10]    Liu, C., Zhang, Y. H. Methodology and practice on quantifying the impact of global change & earth system science data in 2019[J]. Journal of Global Change Data & Discovery, 2019. 3(3): 207-226. DOI: 10.3974/geodp.2019.03.01.

[11]    Shi, R. X., Ma, J. H., Liu, C., et al. Statistics and analysis of the global change research data publishing & sharing (2018)[J]. Journal of Global Change Data & Discovery, 2019. 3(1): 1-9. DOI: 10.3974/geodp. 2019. 01. 01.

[12]    Shi, R. X., Liu, C., Ma, J. H., et al. Statistics and analysis of global change research data publishing & sharing (2014-2017)[J]. Journal of Global Change Data & Discovery, 2017. 1(4): 383-390. DOI: 10.3974/geodp.2017.04.01.

[13]    Geographical Society of China. Global change research data publishing & sharing rankings [R]. Journal of Global Change Data &Discovery, 2018, 2(3): 243–247. DOI: 10.3974/geodp.2018.03.01.

[14]    Geographical Society of China. Global change research data publishing & sharing rankings (Top 10)[J]. Journal of Global Change Data & Discovery, 2017. 1(2): 249-251. DOI: 10.3974/geodp.2017.02.23.

Co-Sponsors

Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences

The Geographical Society of China

Parteners

Committee on Data for Science and Technology (CODATA) Task Group on Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries (PASTD)

Jomo Kenyatta University of Agriculture and Technology

Digital Linchao GeoMuseum