Journal of Global Change Data & Discovery2020.4(2):176-182


Citation:Liu, Y. F., Lv, B. R., Peng, L., et al.Training Samples Dataset for Building Identification in the Urban Village[J]. Journal of Global Change Data & Discovery,2020.4(2):176-182 .DOI: 10.3974/geodp.2020.02.14 .

DOI: 10

Training Samples Dataset for Building
Identification in the Urban Village

Liu, Y. F.1,2  Lv, B. R.1,3  Peng, L.1  Wu, T.1,3  Liu, S.4

1 Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China;

2 Ucastech (Beijing) Smart Co. Ltd., Beijing 100080, China;

3 University of Chinese Academy of Sciences, Beijing 100049, China;

4 Beijing Qingruanhaixin Technology Co. Ltd., Beijing 100085, China



Abstract: Identifying buildings from remote sensing imagery is an important basic methodology used in urban management. The distribution pattern of the building clusters, especially the high density of buildings and narrow streets, among other aspects, are more critical for urban managers. Based on the remotely sensed images obtained in Google Maps, 2328 samples of building clusters in an urban village were drawn by using LabelMe software. The building information was extracted by using the Mask R-CNN, which is an example of a segmentation algorithm used in deep learning. The data set includes: (1) original sample images (Buildingsample_pic); (2) sample segmentation results (Buildingsample_mask), and; (3) sample segmentation annotation (Buildingsample_info). The data set consists of 6984 data files in three data folders, having .png and .yaml data formats. The data set’s size is 499 MB (compressed into one file: 498 MB). The research paper related to the data set will be published in the Proceedings of the first China Digital Earth Conference.

Keywords: urban village; building cluster; deep learning; Mask R-CNN; Proceedings of the first China Digital Earth Conference

1 Introduction

With the continuous development of urban construction and urban governance, the problem of the urban village is now of widespread concern [1,2]. The urban village is a residential area built on the original rural collective’s land and farmers’ homestead during phases of urban expansion, in which buildings are an important part. Urban village buildings are disordered and heterogeneous pathological settlement patterns perhaps best described as “city is not like city, village is not like village” [3]: because of its high density of buildings, narrow streets and lanes, illegal building and other characteristics, the urban villages’ shape is diverse and is structurally complex, which has always been a contentious and difficult topic in academic research. The urban village building community is a target subject, with discernable structural characteristics, in the analysis of remotely sensed images of urban areas, because of its unique distribution and patterning. In recent years, with advances in artificial intelligence and deep learning techniques, many scholars have begun to research how to apply deep learning to extract buildings from such imagery. Compared with a data-driven method and model-driven method, the building extraction process based on machine learning requires less prior knowledge and it can achieve high extraction accuracy when using suitable samples [4-7]. In this paper, a large and medium-sized city in northern China was selected as the sample drawing basis. By using Google Maps remote sensing imagery, with a spatial resolution of 0.11 m, a total of 2328 urban village building samples were drawn by LabelMe software. This study provides basic data for remote sensing image analysis based on deep learning, specifically the case segmentation algorithm mask R-CNN, and includes an application case of the case segmentation sample. This work has practical significance for applying artificial intelligence information extraction in urban governance.

2 Metadata of the Dataset

The data set name, its short name, author information, geographical region, data age, spatial resolution, data format, data volume, data set composition, data computing environment, data publishing and sharing service platform, data sharing policy and other information of the sample data for training samples dataset of “Building Identification in the Urban Village” [8] (Samples_BuiUrbanVill) are shown in Table 1.


Table 1  Dataset Metadata Profile of Training Samples Dataset for Building Identification in the Urban Village



Dataset Name

Training Samples Dataset for Building Identification in the Urban Village

Short Name Of Dataset


Author Information

Liu Yufei, Aerospace Information Research Institute, Chinese Academy of Sciences,  Ucastech (Beijing) Smart Co. Ltd.,

Lv Beiru, Aerospace Information Research Institute, Chinese Academy of Sciences, University of Chinese Academy of Sciences,

Peng Ling, Aerospace Information Research Institute, Chinese Academy of Sciences,,cn

Wu Tong, Aerospace Information Research Institute, Chinese Academy of Sciences, University of Chinese Academy of Sciences,

Liu Sai, Beijing Qingruanhaixin Technology Co. Ltd.,

Data Age


Spatial Resolution

0.11 m

Data Format

.png, .txt, and .yaml

Data Volume

498MB (after compression)

Dataset Composition

(1) Sample segmentation result (Buildingsam-ple_mask); (2) original sample images (Buildingsample_pic); (3) sample segmentation annotation (Buildingsample_info).

Fund Projects

The Beijing Municipal Science and Technology Project, No. Z191100001419002

Data Computing Environment


Python: 3.6; TensorFlow-gpu: 1.3.0; Keras: 2.0.8

Publishing and Sharing Service Platform

Global Change Research Data Publishing & Repository


Institute of Geographical Sciences and resources, Chinese Academy of Sciences, 100101, a 11 Datun Road, Chaoyang District, Beijing







Data Sharing Policy

The “data” of global change scientific research data publishing & repository includes metadata (in Chinese and English), entity data (in Chinese and English), and data papers published through the Journal of global change data (Chinese and English). The sharing policies are as follows: (1) The “data” are open to the whole society, free of charge, through the Internet system in the most convenient way, and users can browse and download it free of charge; (2) The end-user needs to mark the data source in the reference or appropriate position according to the citation format; (3) Value-added service users or distribute and disseminate in any form (including through computer services)—the user of “data” must sign a written agreement with the editorial department of Journal of Global Change Data (Chinese and English) to obtain permission; (4) The author who extracts some records from the “data” and creates new data must follow the 10% quotation principle, that is, the data records extracted from this dataset constitute less than 10% of the total records of the new dataset, and the extracted data records need to be marked as “Data sources” [9].

Data and Paper Retrieval System



3 Data Development Methods

The sample of remotely sensed images in this project are divided into target detection samples, semantic segmentation samples, and instance segmentation samples according to their specific uses [6]. The samples used for target detection must have the location and type of the target feature labeled, that is, by drawing the external rectangular box of the target feature and labeling its category; for semantic segmentation, its sample needs to have the outline and type of the target feature labeled, that is, by drawing the outline of the target feature and labeling its category; for instance segmentation, its sample should have the outline and the category of the target feature marked, that is, by drawing the outline of a single object and labeling its category. Currently, the most commonly used software tools for drawing on images are LabelMe, ArcGIS, and Labellmg.


Figure 1  Flow chart of image samples’ drawing

According to the remote sensing imagery and from the ground real-scene photos, this paper used LabelMe software to obtain the building samples from an urban villages, which were then used for deep learning by the instance segmentation algorithm. The operational flow chart for this is shown in Figure 1.

Drawing steps:

(1) Remote sensing imagery selection

Combined with the unique distribution pattern of the urban village building community, high building density, narrow streets and lanes, an image captured via Google remote sensing with a resolution of 0.11 m is selected as the remote sensing image data of this data set.

(2) Image segmentation

A remote sensing image is divided into the target size, which is generally an exponential square with side length of 2. The sample set cuts the original image data and their labels into 512 × 512 sizes for subsequent model training. After the original image is segmented the pic file is obtained, which is the sample’s original image set pic, as shown in Figure 2-a.

(3) Labelme draws the buildings in the village in the city

Draw the outline of the building in LabelMe and mark it in the form of vbuilding *.

(4) Format conversion

According to the JSON file generated by LabelMe, the sketch sample is converted to an executable dataset format. Next, the mask images generated are sorted to derive the mask file of the instance segmentation result set, as shown in Figure 2-b.


Fig 2-a. Original image of the sample

Fig 2-b. Image mask of the sample


Figure 2  Original image and mask of the sample


(5) Data enhancement

The generated mask image (mask) and original image (pic) are flipped horizontally then vertically, and rotated 90°, rotated 180°, and rotated 270° to increase the number of samples, as shown in Figure 3.


Fig 3-a. original image

Fig 3-b. horizontal flip

Fig 3-c. vertical flip

Fig 3-d. 90 ° flip

Fig 3-e. 180 ° flip

Fig 3-f. 270 ° flip

Figure 3  Schematic diagram of data enhancement

4 Data Results and Validation

4.1 Dataset composition

The sample data set of urban village buildings includes: (1) case segmentation result set mask, file format .png; (2) sample original image set pic, file format .png; (3) instance segmentation annotation information, info.yaml. A total of 2328 urban village building samples were drawn. After the data set was compressed into *.rar file by software, the data volume was 498 MB.

Table 2  Description of data set file composition

Serial number

File name

Document description

Data volume (MB)



sample segmentation results (.png)




original sample images (.png)




sample segmentation annotation (.yaml)



4.2 Validation of data results

The case segmentation algorithm, mask R-CNN, was used to extract building information [10-13], and 678 urban village building samples were tested and verified. The algorithm of extracting village buildings in city by mask R-CNN is shown in Figure 4.

文本框: Figure 4  Mask R-CNN for urban village building extractionTo quantitatively evaluate the performance of the algorithm, average precision (AP) was used as the evaluation standard of experimental accuracy. After verification, the AP of the model on the test set was 0.66, and the maximum detection accuracy AP of a single urban village building sample image reached 0.995. These results fully demonstrate that mask R-CNN can achieve robust detection performance on the sample data set of buildings in this urban village.

AP is the area formed by the accuracy recall curve and X and Y axes, which is calculated by formula (1). The higher the AP, the better the performance of the model, and vice versa. Therefore, the calculation of AP involves the calculation of both ‘precision’ and ‘recall.’ The precision rate refers to the ratio of TP (True Positive) to the number of all detected targets, as shown in formula (2). Recall rate refers to the ratio of TP (True Positive) to all actual target numbers, as shown in formula (3).





Table 3  Evaluation index of target detection




True Positive


Number of positive samples detected correctly

True Negative


Number of negative samples correctly detected

Fasle Positive


Number of negative samples detected as positive samples by error

False Negative


Number of positive samples detected as negative by error

Fig 5-a. original image

Fig 5-b. test result map


Figure 5  Analysis and comparison of test results


FN is derived from the difference between the number of labeled buildings and TP. To calculate TP and FP, we set the IOU (Intersection Over Union) to judge the correctness of the test results, and set the threshold value to 0.5. When the IOU > 0.5, the test results are considered to be reliable, that is, the positive samples were correctly detected; otherwise, it is a false positive in which a positive sample was detected by mistake. The specific formula for calculating IOU is shown in formula (4).


The results show that the average building area of the experimental area is 75.08 m2, and the average nearest-neighbor distance is 0.90 m. According to the kernel density estimation results, the building density of the studied area is 43.75%, and the green space rate is 5.12% [14]. According to the regulations of the People’s Republic of China on the planning and design standards of urban residential areas, this makes it a high-density residential area[15].

5 Conclusion

The sample set is based on 0.11-m spatial resolution of remote sensing imagery produced by Google Maps, for which the location, outline, and type of each single building in a city’s village was marked.  According to the sample, we provide an application case of case segmentation of single building in urban village.Our experimental results show the following:

(1) The network structure of mask R-CNN has advantages in building target detection. The sample set has high practicability when using an instance segmentation algorithm mask R-CNN to extract information by deep learning. The AP reached 0.66, and the highest detection accuracy of a single urban village building sample image reached 0.995. When the sample quality is good and the similarity between sample set features and verification set features is high, mask R-CNN can achieve an high accuracy and recall rate;

(2) Spatial analysis of the information extraction results can effectively convey the distribution characteristics of small average building area, narrow streets, high density of buildings and complex building types.

The sample set provides the basic data for the use of remotely sensed images based on a deep learning algorithm to extract the buildings in urban villages. It offers sound practical significance for studying the spatial distribution characteristics of urban villages and the intelligent analysis and application of urban villages’ governance.

Author contributions

Wu Tong was responsible for the technical route of the data set development; Liu Yufei and Lu Beiru collected and processed the sample data of urban villages; Lu Beiru was responsible for the design of models and algorithms; Lu Beiru and Liu Sai were responsible for data validation; Liu Yufei and Lu Beiru were responsible for writing the data paper. Peng Ling was responsible for data organization, sample types, and production process, as well as value judgment and evaluation.


[1]     Li, Z. Y., Yang, Y. C. Research progress of urban village in China [J]. Gansu Science and Technology, 2008 (7): 7–11.

[2]     Zhou, X. H. Urban village problem: an economic analysis of its formation, existence and transformation [D]. Shang: Fudan University, 2007

[3]     Deng, C. Y., Wang, Y. R. A review of the research on urban villages in China [J]. Journal of Guangdong University of Administration (1): 93–97.

[4]     Zhao, Y. H., Chen, G. Q., Chen, G. L., et al. Extraction of urban village buildings from multi-source big data: a case study of Tianhe District, Guangzhou City [J]. Geography and Geographic Information Science, 2018, 34 (5): 3, 13–19.

[5]     Liang Yd. Research on the application of UAV system in urban village reconstruction”, Beijing Surveying and Mapping, 2018, 32 (10): 70–73.

[6]     S. D. Mayunga and S. D. Mayunga. “Semi-automatic building extraction in informal settlements from high-resolution satellite imagery, ” 2006.

[7]     Cheng, T. Construction and application method of big data of remote sensing image sample [J]. Application of Computer System, 2017, 026(005): 43-48.

[8]     LIU Yf, LV Br, PENG L, WU T, LIU S “Training Samples Dataset of Building Identification in Urban Village, ” Global Change Data Repository, 2020. DOI: 10.3974/geodb.2020.02.16.V1.

[9]    GCdataPR Editorial Office. GCdataPR Data Sharing Policy [OL]. DOI: 10.3974/dp.policy.2014.05 (Updated 2017)

[10]  Ji Sp, Wei Sq. Convolutional neural network and open source dataset method for building extraction from remote sensing images, ” Acta Sinica Sinica, 2019.48 (04): 50–61.

[11]  T. Y. Lin, P. Dollár, R. Girshick, K. He, and S. Belongie, “Feature Pyramid Networks for Object Detection, ” 2016.

[12]  Hirata T, Kuremoto T, Obayashi M, et al. “Deep Belief Network Using Reinforcement Learning and Its Applications to Time Series Forecasting,” International Conference on Neural Information Processing. Springer International Publishing, 2016.

[13]  Fu F, Wei Jy, Zhang Ln. Research on building extraction from remote sensing image based on convolution network,” Software Engineering, v21; 228 (6): 8–11.

[14]  Lv Br, Peng L, Wu T, et al. Research on urban building extraction method based on deep learning convolutional neural network,” IOP Conference Series Earth and Environmental Science, 2020, 502: 012022.

[15]  “The regulations of the people's People’s Republic of China on the planning and design standards of urban residential areas,” China Architecture & Building Press, 2002.