Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology
MA Kai1TIAN Miao1TAN Yongjian1WANG Shu2XIE Zhong3,4QIU Qinjun*3,4
1 College of Computer and Information Technology,China Three Gorges University,Yichang 443002,China2 State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China3 National Engineering Research Center of Geographic Information System,Wuhan 430074,China4 School of Geography and Information Engineering,China University of Geosciences,Wuhan 430074,China
DOI:10.3974/geodb.2021.09.04.V1
Published:Sep. 2021
Visitors:9624 Data Files Downloaded:160
Data Downloaded:775.70 MB Citations:
Key Words:
Regional geological survey report,named entity identification,consistency check,testing,evaluation
Abstract:
Information extraction and mining of geological survey reports can make reuse of the hidden value of existing reports and promote the discovery of new knowledge. The Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology was developed based on the Regional Geological Survey Reports in Nima Area, Zhiduo, Jinniuzhen-Gaoqiao, and Yangchun Counties, Guangdong province. Six typical geological named entity types - geological time, geological structure, strata, rocks, minerals, and locations, and keywords were established. The six categories of entities were labeled using the cross-labeling mode of domain experts and groups, with the assistance of software. The dataset has four rounds of labeling, and each stage of the dataset has been checked, tested, and evaluated for consistency. The results show that the consistency check results have reached more than 85% after three rounds of labeling. The dataset is archived in .txt data format, and consists of one single file with 4.84 MB.Browse
Foundation Item:
National Natural Science Foundation of China (42050101, U1711267, 41871311, 41871305)
Data Citation:
MA Kai, TIAN Miao, TAN Yongjian, WANG Shu, XIE Zhong, QIU Qinjun*. Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology[J/DB/OL]. Digital Journal of Global Change Data Repository, 2021. https://doi.org/10.3974/geodb.2021.09.04.V1.
MA Kai, TIAN Miao, TAN Yongjian. et al. Development of a named entity recognition dataset based on four regional geological survey reports [J]. Journal of Global Change Data & Discovery, 2022, 6(1): 78-84.
References:
[1] Lu, S. W., Du, F. J., Ren, J. D. 1:250,000 regional geological survey report of Nima District(H45C001003) [DS]. National Geological Archives of China, 2002. DOI: 10.35080/n01.c.93307.
     [2] Wang, Y. Z., Liu, S. J., Qi, S. S., et al. 1:250,000 regional geological survey report of Zhiduo County(I46C003004) [DS]. National Geological Archives of China, 2006. DOI: 10.35080/n01.c.105419.
     [3] Li, X. W., Wu, B., Shi, B., et al. 1:50,000 regional geological survey report of Jinniu Town (H50E012003) and Gaoqiao (H50E013003) [DS]. National Geological Archives of China, 2009. DOI: 10.35080/n01.c.123962.
     [4] Hong, Y. R., Guo, L. T., Liu, H. D., et al. 1:250,000 regional geological survey report of Yangchun County(F49C002003) [DS]. National Geological Archives of China, 2004. DOI: 10.35080/n01.c.122045.
     
Data Product:
ID |
Data Name |
Data Size |
Operation |
0 | Datapaper_无.pdf | 434.00kb | DownLoad |
1 |
NERdata.txt |
4964.48KB |
|