Journal of Global Change Data & Discovery2023.7(3):233-241

[PDF] [DATASET]

Citation:Wang, R. H., Wang, P. H., Xu, C. D., et al.Spatial Dataset of Breast Cancer Incidence Rate in China (2014–2016)[J]. Journal of Global Change Data & Discovery,2023.7(3):233-241 .DOI: 10.3974/geodp.2024.02.04 .

Spatial Dataset of Breast Cancer Incidence Rate in China (2014?C2016)

Wang, R. H.1,2  Wang, P. H.1,2  Xu, C. D.1,2  Wang, W.1,2, Wang, Z. B.1,2*

1. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China;

2. University of Chinese Academy of Sciences, Beijing 100049, China

 

Abstract: The authors used data from the China Cancer Registry Annual Report from 2017-2019, organizing the breast cancer incidence rates recorded in various cancer registries nationwide for the years 2014-2016. We vectorized and spatially visualized the breast cancer incidence rates at the county level in China using ArcGIS software. Descriptive and spatial statistical analyses were employed as study methods to explore the regional differences, spatial distribution, and trends of the changes in the breast cancer incidence rates in China. A dataset on the spatial distribution of breast cancer incidence at the county level in China for 2014?C2016 was constructed, which contains the following: (1) county-level breast cancer incidence data; (2) grouped statistical data of county-level breast cancer incidence; (3) statistical data of breast cancer incidence in eastern, central, and western regions. The dataset is archived in .shp and .xlsx formats, consisting of 25 data files, with a total size of 21.5 MB.

Keywords: China; breast cancer; incidence; spatial distribution

DOI: https://doi.org/10.3974/geodp.2024.02.04

CSTR: https://cstr.escience.org.cn/CSTR:20146.14.2024.02.04

Dataset Availability Statement:

The dataset supporting this paper was published and is accessible through the Digital Journal of Global Change Data Repository at: https://doi.org/10.3974/geodb.2024.06.06.V1 or https://cstr.escience.org.cn/CSTR:20146.11.2024.06.06.V1.

1 Introduction

Breast cancer is one of the most common cancers worldwide, with the highest incidence rates observed in developed regions such as North America, Europe, and Australia[1]. In China, the incidence of breast cancer is relatively low compared with the global levels; however, over the past 30 years, this incidence has increased by 20%?C30%, for an annual growth rate of approximately 3%?C5%, surpassing the global average growth rate of 1.5%[2]. The national cancer statistics released by the National Cancer Center (NCC) of China indicate that breast cancer has the highest incidence among the cancers affecting Chinese women[3]. The burden due to breast cancer in China may become more severe in the future with rapid economic and social development, population growth, the aging trend, and major prevalent risk factors[4]. Therefore, research on breast cancer in China is urgently needed to address and control the personal and socioeconomic burdens associated with breast cancer risk.

China is a vast territory with wide differences in its socioeconomic conditions and natural environments across various regions. Treating China as a single entity overlooks the regional heterogeneity in how risk factors affect disease incidence, potentially leading to research conclusions that do not align with the actual situation[5]. Most recent studies on the spatial distribution of and factors influencing breast cancer in China have been limited to small areas such as individual cities or provinces[6?C9]. Nationwide studies have been relatively scarce and have mainly been conducted at the provincial level[10?C12]. This limitation regarding spatial scale may have resulted in a lack of precision in the results, so the true breast cancer situation in China may not have been fully reflected.

In this study, we used counties as the unit, organizing breast cancer incidence data from various cancer registries in the China Cancer Registry Annual Report from 2017 to 2019[13?C15]. These data were converted into vector data using ArcGIS 10.8 software to create a dataset. With this dataset, we explored the regional differences in, the spatial distribution of, and the trends in the changes in in breast cancer incidence in China using statistical charts, vector data visualization, and spatial autocorrelation analysis. This study provides a scientific basis for formulating targeted breast cancer prevention and control strategies.

2 Metadata of the Dataset

The metadata from the Dataset of spatial distribution of breast cancer incidence in China (2014-2016) [16] are summarized in Table 1, including the dataset name, authors, geographical region, year of the dataset, data files, data publication and sharing service platform, and data sharing policy, etc.

3 Methods

3.1 Data Sources

The breast cancer data used in this study were obtained from the China Cancer Registry Annual Report published by the National Cancer Center (NCC)[13?C15]. The data cover the period from 2014 to 2016, and include 339, 388, and 487 cancer registry areas, respectively, across all 31 provinces in mainland China. The populations included in this study were 288 million, 321 million, and 382 million, accounting for 21.07%, 23.35%, and 27.6% of China??s end-of-year population in 2014?C2016, respectively. The women accounted for 142 million, 158 million, and 187 million of these values, respectively. Among these, 59, 806, 67, 328, and 79, 450 were new breast cancer cases in women, respectively.

The annual report includes two types of breast cancer incidence rate statistics: (1) The crude incidence rate is the basic indicator of the incidence in the population. This refers to the number of new cancer cases registered per 100,000 people in a specific year in a specific area and reflects the population incidence level. (2) The age-standardized rate (ASR) is adjusted for the age structure of a given standard population to eliminate the impact of age structure on the incidence level. Because the crude incidence rate is strongly affected by the age composition of the population, we selected the age-standardized incidence rate of breast cancer (units: per 100,000) from the annual report as the statistical indicator.

Table 1  Metadata summary of the dataset

Items

Description

Dataset full name

Spatial dataset of breast cancer incidence rate in China (2014?C2016)

Dataset short name

BreastCancerIR2014-2016

Authors

Wang, R. H., Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, wangruohan2446@igsnrr.ac.cn

 

Wang, P. H., Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, wph1996@126.com

 

Xu, C. D., Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, xucd@lreis.ac.cn

 

Wang, W., Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, wang_wei@lreis.ac.cn

 

Wang, Z. B., Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, wangzb@igsnrr.ac.cn

Geographical region

China

Years

2014-2016

Data format

.shp, .xlsx

Data size

21.5 MB

Data files

Spatial distribution vector data and tabulated statistical data on breast cancer incidence in China (2014-2016)

Foundations

National Natural Science Foundation of China (42130713); Ministry of Science and Technology of P. R. China (2019QZKK1005)

Data publisher

Global Change Research Data Publishing & Repository, http://www.geodoi.ac.cn

Address

No. 11A, Datun Road, Chaoyang District, Beijing 100101, China

Data sharing policy

(1) Data are openly available and can be free downloaded via the Internet; (2) End users are encouraged to use Data subject to citation; (3) Users, who are by definition also value-added service providers, are welcome to redistribute Data subject to written permission from the GCdataPR Editorial Office and the issuance of a Data redistribution license; and (4) If Data are used to compile new datasets, the ??ten per cent principal?? should be followed such that Data records utilized should not surpass 10% of the new dataset contents, while sources should be clearly noted in suitable places in the new dataset[17]

Communication and searchable system

DOI, CSTR, Crossref, DCI, CSCD, CNKI, SciEngine, WDS, GEOSS, PubScholar, CKRSC

 

3.2 Classification

We descriptively and statistically analyzed the breast cancer incidence rates according to the regional classification method in the 2019 China Cancer Registry Annual Report[15]. The regional classification standards were as follows:

(1) Eastern region: Beijing, Tianjin, Hebei, Liaoning, Shanghai, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong, and Hainan (11 provinces/municipalities).

(2) Central region: Shanxi, Jilin, Heilongjiang, Anhui, Jiangxi, Henan, Hubei, and Hunan (eight provinces).

(3) Western region: Inner Mongolia, Guangxi, Chongqing, Sichuan, Guizhou, Yunnan, Tibet, Shaanxi, Gansu, Qinghai, Ningxia, and Xinjiang (12 provinces/municipalities/ autonomous regions).

3.3 Data Processing

The breast cancer incidence data for the tumor registry areas in 2014, 2015, and 2016 were extracted from the China Cancer Registry Annual Report of 2017, 2018, and 2019, respectively. These data were then summarized in .xlsx file as source data for the descriptive statistics; the data were then spatially visualized using ArcGIS10.8 software, serving as the source data for spatial statistical analysis. Considering the matching between the statistical areas of each cancer registry and the attribute table of China??s county-level vector data, the county-level vector data for 2014?C2016 used in this study included 339, 386, and 485 districts/counties, respectively.

3.4 Spatial Statistical Analysis

Geographical phenomena commonly exhibit spatial dependence: geographically close areas or observation points often show a degree of similarity or correlation. The Moran??s I statistic, which is widely used to test for global spatial autocorrelation, was thus applied to quantify this spatial dependence. The Moran??s I statistic measures the average similarity between each spatial observation and its nearby observations, with values ranging from ?C1 to 1[18]. A Moran??s I of greater than 0 denotes a positive spatial autocorrelation, indicating that observations in neighboring areas tend to be similar; the larger the value, the stronger the correlation. A Moran??s I of less than 0 denotes a negative spatial autocorrelation, indicating that observations in neighboring areas tend to show opposite trends. When Moran??s I equals 0, the spatial distribution of the observations is essentially random, with no spatial autocorrelation. Calculating Moran??s I helps with understanding the spatial characteristics and distribution patterns of geographical phenomena, providing a scientific basis for further spatial analysis and decision-making.

Moran?? s I formula is as follows??

                                                                                        (3)

where x represents the observation variable, xi and xj are the attribute values of the i-th and j-th spatial units respectively (for example, the breast cancer incidence rate of the i-th county-level city and the breast cancer incidence rate of the j-th county-level city),wij is the spatial weight between variables i and j, and n is the total number of variables.

4 Data Results

4.1 Descriptive Statistical Analysis

Cancer registration involves long-term, continuous, and dynamic systematic monitoring of cancer prevalence, trends, and influencing factors. Cancer registration is thus a crucial foundational task for developing cancer prevention and control strategies, conducting comprehensive research, and evaluating the effectiveness of cancer prevention and control[15]. From 2014 to 2016, the National Cancer Center included 339, 388, and 487 cancer registries, showing the expansion of its coverage. This expansion reflects the country??s emphasis on cancer prevention and control and the health of its people. The number and distribution of the districts and counties used in this study are shown in Table 2.

 

Table 2  Number of cancer registries from 2014 to 2016

 

2014

2015

2016

Number

339

386

485

Increment

 

 47

 99

 

First, a descriptive statistical analysis was performed on county-level breast cancer incidence rates in China. The resulting histogram shows that between 2014 and 2016, most counties in China had an age-standardized incidence rate (ASR) of breast cancer of below 35, with the highest number of counties having an ASR of between 15 and 30, indicating that the overall breast cancer incidence rate in China was relatively stable (Figure 1). However, the box plot reveals some variation in breast cancer incidence rates among the counties in eastern, central, and western China (Figure 2).

 

Figure 1  Histogram of breast cancer incidence rates in China: (a) 2014; (b) 2015; (c) 2016

 

Figure 2  Box plot of breast cancer incidence rates in Eastern, Central, and Western China: (a) 2014; (b) 2015; (c) 2016

 

Table 3 shows that breast cancer incidence rates in the eastern region were generally higher than those in the central and western regions. The median incidence rates in the eastern region increased annually, with the largest variance, indicating substantial spatial heterogeneity in the incidence rates across the counties. Although the median incidence rate in the central region varied little, the variance increased annually, suggesting a possible trend toward more pronounced spatial differentiation in this region. The median incidence rate in the western region was considerably lower than those in the eastern and central regions, and the variance was relatively stable, indicating that the incidence rates in this area were relatively stable.

 

Table 3  Incidence rate of breast cancer in Eastern, Central and Western China, 2014?C2016.

Region

Year

Max

Min

Median

Var

Eastern

2014

62.23

9.37

25.79

100.57

2015

69.81

9.57

26.94

101.41

2016

86.75

6.96

27.54

100.67

Central

2014

50.31

8.91

23.12

 82.56

2015

60.36

1.30

22.75

 86.70

2016

58.35

3.19

24.06

 91.90

Western

2014

46.51

0.00

19.84

 89.29

2015

57.33

3.79

20.20

 89.28

2016

52.56

3.88

19.89

 89.60

4.2 Spatial Statistical Analysis

From 2014 to 2015, Liaoning, Shandong, Henan, Shanghai, and Shenzhen had higher breast cancer incidence rates. In 2016, the high-incidence areas expanded to include Beijing, Hainan, Inner Mongolia, and some regions in western China. Regions with lower incidence rates were primarily located in the central and western parts of China and in Jiangsu and Fujian in the eastern region (Figure 3). The results of the spatial autocorrelation analysis indicated that from 2014 to 2016, the p-value for Moran??s I of breast cancer incidence rates in China was consistently less than 0.000,1, demonstrating strong spatial clustering of the breast cancer incidence rates (Table 4).

 

Figure 3  Maps of breast cancer incidence rates in China (2014-2016)

 

Table 4  Global spatial autocorrelation analysis of breast cancer incidence rates in China (2014?C2016)

Year

Moran??s I

z-score

p-value

Spatial Pattern

2014

0.17

 7.12

<0.000,1

clustered

2015

0.12

13.15

<0.000,1

clustered

2016

0.16

20.62

<0.000,1

clustered

 

Figure 4 shows that breast cancer incidence rates were higher in most regions in China in 2016 than in 2014. Notably, the incidence rates increased in areas such as Beijing?CTianjin, Liaoning, southern Hebei, the Yangtze River Delta, and the Pearl River Delta, with some increases observed in parts of southwest and northwest China. Conversely, the incidence rates decreased in certain counties in northern Hebei, eastern Shandong, and central Jiangsu in the eastern region; in central?Csouthern Anhui, Hunan, and Hubei in the central region; and in some counties in Sichuan, Yunnan, and Gansu in the western region.

Figure 4  Analysis map of changes in breast cancer incidence rates in China (2014-2016)

5 Discussion and Conclusions

Based on data from the China Cancer Registry Annual Report from 2017 to 2019, we constructed a dataset that provides vector data on county-level breast cancer incidence rates in China for the years 2014?C2016, enhancing the spatial precision of the previously published rates. Using descriptive and spatial statistical methods, we thoroughly investigated the regional differences and spatial trends in breast cancer incidence rates across China.

The results indicated that the breast cancer incidence rates in China exhibited considerable regional differences, with rates decreasing from the east to central and then to western regions, with notable spatial clustering. From 2014 to 2016, regions with higher breast cancer incidence rates expanded from Liaoning, Shandong, Henan, Shanghai, and Shenzhen to Beijing, Hainan, Inner Mongolia, and some areas in western China. Conversely, regions with lower incidence rates remained consistently distributed in the central and western regions, as well as in Jiangsu and Fujian in the eastern region. From 2014 to 2016, breast cancer incidence rates increased in most regions in China, particularly in areas such as Beijing?CTianjin, Liaoning, southern Hebei, the Yangtze River Delta, and the Pearl River Delta. Some counties in southwest and northwest China also experienced rising incidence rates. However, counties with decreasing incidence rates were primarily located in northern Hebei, eastern Shandong, and central Jiangsu in the eastern region; central?Csouthern Anhui, Hunan, and Hubei in the central region; and Sichuan, Yunnan, Gansu, and Ningxia in the western region. In some high-incidence areas, such as Henan and Shandong, breast cancer incidence rates decreased over the study period. Conversely, regions with initially low incidence rates, such as southern Hebei, Jiangsu, northern Anhui, and Ningxia, showed an upward trend in their rates. This indicates that while monitoring, prevention, and treatment efforts should continue in high-incidence areas, increased attention must be paid to the rising incidence rates in lower-incidence areas. This focus will help to promote the rational allocation of healthcare resources, enhance public health awareness, and gradually control and reduce the national and individual burdens due to breast cancer.

The dataset covers approximately 17% of all counties in China; therefore, the true breast cancer incidence rates across the entire country may not be fully represented. However, the results can be considered reasonably representative given the overall volume of data. Future studies should build on these findings to further analyze the spatial variation in breast cancer incidence rates and the underlying mechanisms for these variations. This information will help with devising more effective and rational policies and methods for controlling and mitigating the various negative effects of breast cancer.

 

Author Contributions

Wang, R. H. developed the overall design for the construction of the dataset; Wang, R. H. collected and processed the data and wrote the paper; Wang, R. H., Wang, P. H., Xu C. D., Wang W., and Wang Z. B. validated the data.

 

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]       Sung, H., Ferlay, J., Siegel, R. L., et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries [J]. CA-A Cancer Journal for Clinicians, 2021, 71(3): 209?C249.

[2]      Li, T., Mello-Thoms, C., Brennan, P. C. Descriptive epidemiology of breast cancer in China: incidence, mortality, survival and prevalence [J]. Breast Cancer Research and Treatment, 2016, 159(3): 395?C406.

[3]      Zheng, R., Zhang, S., Zeng, H., et al. Cancer incidence and mortality in China, 2016 [J]. Journal of the National Cancer Center, 2022, 2(1): 1?C9.

[4]      Lei, S., Zheng, R., Zhang, S., et al. Breast cancer incidence and mortality in women in China: temporal trends and projections to 2030 [J]. Cancer Biology & Medicine, 2021, 18(3): 900?C909.

[5]      Wang, J. F., Zhang, T. L., Fu, B. J. A measure of spatial stratified heterogeneity [J]. Ecological Indicators, 2016, 67: 250?C256.

[6]      Fei, X., Lou, Z., Christakos, G., et al. A geographic analysis about the spatiotemporal pattern of breast cancer in Hangzhou from 2008 to 2012 [J]. Plos One, 2016, 11(1): 1-13.

[7]      Song, M. J., Huang, X. X., Wei, X. Q., et al. Spatial patterns and the associated factors for breast cancer hospitalization in the rural population of Fujian Province, China [J]. BMC Womens Health, 2023, 23(1): 247-255.

[8]      Huo, Q., Zhang, N., Wang, X., et al. Effects of ambient particulate matter on human breast cancer: is xenogenesis responsible? [J]. Plos One, 2013, 8(10): e76609-e76615.

[9]      Yu, Q., Zhang, L., Hou, K., et al. Relationship between air pollutant exposure and gynecologic cancer risk [J]. International Journal of Environmental Research and Public Health, 2021, 18(10): 5353-5366.

[10]   He, R., Zhu, B., Liu, J., et al. Women??s cancers in China: a spatio-temporal epidemiology analysis [J]. BMC Womens Health, 2021, 21(1): 116-129.

[11]   Xia, C., Kahn, C., Wang, J., et al. Temporal trends in geographical variation in breast cancer mortality in China, 1973-2005: an analysis of nationwide surveys on cause of death [J]. International Journal of Environmental Research and Public Health, 2016, 13(10): 963-978.

[12]   Hu, M. Y., Jiang, C., Meng, R. T., et al. Effect of air pollution on the prevalence of breast and cervical cancer in China: a panel data regression analysis [J]. Environmental Science and Pollution Research, 2023, 30(34): 82031-82044.

[13]   National Cancer Center. 2017 China Cancer Registry Annual Report [M]. Beijing: People??s Health Publishing House, 2018.

[14]   National Cancer Center. 2018 China Cancer Registry Annual Report [M]. Beijing: People??s Health Publishing House, 2019.

[15]   National Cancer Center. 2019 China Cancer Registry Annual Report [M]. Beijing: People??s Health Publishing House, 2021.

[16]   Wang, R. H., Wang, P. H., Xu C. D., et al. Spatial dataset of breast cancer incidence rate in China (2014-2016) [J/DB/OL]. Digital Journal of Global Change Data Repository, 2024. https://doi.org/10.3974/geodb.2024.06.06.V1. https://cstr.escience.org.cn/CSTR:20146.11.2024.06.06.V1.

[17]   GCdataPR Editorial Office. GCdataPR data sharing policy [OL]. https://doi.org/10.3974/dp.policy.2014.05 (Updated 2017).

[18]   Moran, P. A. P. Notes on continuous stochastic phenomena [J]. Biometrika, 1950, 37(1/2): 17?C23.

 

Co-Sponsors
Superintend