Dataset List

Vol.|Area

Data Details

Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology


MA Kai1TIAN Miao1TAN Yongjian1WANG Shu2XIE Zhong3,4QIU Qinjun*3,4
1 College of Computer and Information Technology,China Three Gorges University,Yichang 443002,China2 State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China3 National Engineering Research Center of Geographic Information System,Wuhan 430074,China4 School of Geography and Information Engineering,China University of Geosciences,Wuhan 430074,China

DOI:10.3974/geodb.2021.09.04.V1

Published:Sep. 2021

Visitors:3932       Data Files Downloaded:160      
Data Downloaded:775.70 MB      Citations:

Key Words:

Regional geological survey report,named entity identification,consistency check,testing,evaluation

Abstract:

Information extraction and mining of geological survey reports can make reuse of the hidden value of existing reports and promote the discovery of new knowledge. The Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology was developed based on the Regional Geological Survey Reports in Nima Area, Zhiduo, Jinniuzhen-Gaoqiao, and Yangchun Counties, Guangdong province. Six typical geological named entity types - geological time, geological structure, strata, rocks, minerals, and locations, and keywords were established. The six categories of entities were labeled using the cross-labeling mode of domain experts and groups, with the assistance of software. The dataset has four rounds of labeling, and each stage of the dataset has been checked, tested, and evaluated for consistency. The results show that the consistency check results have reached more than 85% after three rounds of labeling. The dataset is archived in .txt data format, and consists of one single file with 4.84 MB.Browse

Foundation Item:

National Natural Science Foundation of China (42050101, U1711267, 41871311, 41871305)

Data Citation:

MA Kai, TIAN Miao, TAN Yongjian, WANG Shu, XIE Zhong, QIU Qinjun*.Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology[J/DB/OL]. Digital Journal of Global Change Data Repository, 2021. https://doi.org/10.3974/geodb.2021.09.04.V1.

MA Kai, TIAN Miao, TAN Yongjian. et al. Development of a named entity recognition dataset based on four regional geological survey reports [J]. Journal of Global Change Data & Discovery, 2022, 6(1): 78-84.

References:

[1] Lu, S. W., Du, F. J., Ren, J. D. 1:250,000 regional geological survey report of Nima District(H45C001003) [DS]. National Geological Archives of China, 2002. DOI: 10.35080/n01.c.93307.
     [2] Wang, Y. Z., Liu, S. J., Qi, S. S., et al. 1:250,000 regional geological survey report of Zhiduo County(I46C003004) [DS]. National Geological Archives of China, 2006. DOI: 10.35080/n01.c.105419.
     [3] Li, X. W., Wu, B., Shi, B., et al. 1:50,000 regional geological survey report of Jinniu Town (H50E012003) and Gaoqiao (H50E013003) [DS]. National Geological Archives of China, 2009. DOI: 10.35080/n01.c.123962.
     [4] Hong, Y. R., Guo, L. T., Liu, H. D., et al. 1:250,000 regional geological survey report of Yangchun County(F49C002003) [DS]. National Geological Archives of China, 2004. DOI: 10.35080/n01.c.122045.
     

Data Product:

ID Data Name Data Size Operation
0Datapaper_无.pdf434.00kbDownLoad
1 NERdata.txt 4964.48KB
Co-Sponsors

Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences

The Geographical Society of China

Parteners

Committee on Data for Science and Technology (CODATA) Task Group on Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries (PASTD)

Jomo Kenyatta University of Agriculture and Technology

Digital Linchao GeoMuseum