Journal of Global Change Data & Discovery2020.4(1):47-54

[PDF] [DATASET]

Citation:Lin, L., Kong, X. Z., Li, N., et al.Dataset of Abnormal OLR Signals in Nepal (2009?2018)[J]. Journal of Global Change Data & Discovery,2020.4(1):47-54 .DOI: 10.3974/geodp.2020.01.07 .

DOI: 10

Dataset of Abnormal OLR Signals in Nepal

(2009 ‒2018)

Lin, L.1*  Kong, X. Z.2  Li, N.2  Jiang, X. Y.1

1. College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China

2. College of Computer and Information Sciences, Fujian Agriculture and Forestry University, China

 

Abstract: Studies have shown that changesin surface temperature before a large earthquake can cause abnormal outgoing longwave radiation (OLR), but there is currently no effective techniquefor extractingthese anomalies. We propose a data mining algorithm called Abnormality Detection based on Randomized transducers and power Martingales (ADRM), which uses stochastic sensors and martingale theory to mine anomalies effectively through experimental comparison. OLR data from the source NOAA satellite and the corresponding data sequence after anomalous data mining are taken from the Nepal region for the period 2009–2018.Spatially, the dataset covers 25 grid cells (five rows and five columns) centered on the epicenter of the Nepal earthquake (28.23°N, 84.73°E). Each grid cell covers 2.5° ´ 2.5°, and the epicenter is located in the central cell. In terms of time, each year is defined as 366 days from September 28 of one year to September 28 of the next year. The dataset includes abnormal OLR signals recorded in Nepal from 2009–2018. The dataset is archived in .xls format, consistingof a single file with a data size of 3.92 MB. A research paper based on this dataset, titled “Pre-earthquake Anomaly Data Mining of Remote Sensing OLR in Nepal Earthquake,” was published in the Journal of Geo-information Science (vol. 20(8), 2018), and a paper titled “Relationship of Stress Changes and Anomalies in OLR Data of the Wenchuan and Lushan Earthquakes” was published in the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Vol. 11, 2018).

Keywords: OLR; Nepal earthquake; Datamining; Abnormal information; Journal of Geo-information Science

1 Introduction

Earthquakes are often accompanied by an increase in geothermal radiation, and data related to the temperature increase of landmarks in the seismogenic area before the earthquake can be monitored by thermal infrared remote sensing satellites[1]. Outgoing longwave radiation (OLR) refers to the energy density of electromagnetic waves emitted by the earth–gas system to outer space. Such large-scale changes in surface temperature cause fluctuations in OLR that can be remotely sensed by thermal infrared sensors placed in satellites[2]. After the US polar-orbiting National Oceanic and Atmospheric Administration (NOAA) satellite synchronized with the solar orbit to capture OLR data, the telemetry data were spatially averaged multiple times, resulting in global daily and monthly average OLR data with a 2.5° × 2.5° pitch on the longitude–latitude grid. Human activities, climate change, the greenhouse effect, and the effects of surface temperature, atmospheric temperature, water vapor, and cloud cover destabilize the temperature, meaning that OLR data are noisy, unstructured, and robust. The raw OLRdata are required in order for the anomalous signals and pre-earthquake signs hidden in the data to be extracted through statistical principles and data mining techniques. Many scholars have proposed methods for extracting such data, such as wavelet transforms, Bayesian estimation, fuzzy neural algorithms, and anomaly mining based on errors and key points of earthquake precursor observation data. However, most of these techniquesare ineffective in terms of extracting earthquake-related abnormalitiesand trends, and the bulk of remote sensing data are not fully utilized[3-9]. We propose an outlier signal analysis algorithm called AbnormalityDetection based on Random processand power Martingales (ADRM). This technique uses random sensors and power martingales to mine outlier data from the OLR source data, effectively obtaining the changesin outlier signals and forming a new data sequence after outlier capture. This dataset mainly includes OLR data for the period 2009–2018. The regional range is a rectangular area centered on the epicenter of the 7.8-magnitude earthquake (28.23°N, 84.73°E) that struck Nepal on April 25, 2015. The longitude and latitude coordinates are the dataset formed by applying ADRM to extract abnormal signals from the regional 10-year OLR data centered on 28.23°N, 84.73°E.

2 Metadata of Dataset

The metadata of the abnormal OLR signals in Nepal from 2009–2018 dataset is summarized in Table 1. Among other information, it includes the full and short names of the dataset, the authors, year, temporal resolution, spatial resolution, data format, data size, data files, data publisher, and data sharing policy.

3 Methods

3.1 Algorithmic Principles

For the OLR data source, we performed three processing steps: regional grid division, data preprocessing, and data analysis of the ADRM algorithm for abnormal signal capture.

The first step concerns the division of regional grids [13]. According to the characteristics of OLR data, the data are recorded in grid cells covering a latitude and longitude of 2.5°× 2.5°, and the global area is divided into grid cells based on this unit. For example, the epicenter of the Nepal earthquake (28.23°N, 84.73°E) has relative coordinate values of (25, 34). The 25 grid cells centered on this coordinate value are the OLR data research objects. Corresponding to the column name of the dataset, the relative coordinates of grid cell 1 in the upper left corner are (23, 32), and the corresponding dataset column name is Grid No. 1 (23, 32).

The second step involves the preprocessing of the source data. The source data (OLR_Rawcolumn) uses the “Afternoon Satellite (1430–0230 LST)” data of the OLR data sequence captured by the NOAA-14 satellite. Using the afternoon dataset is believed to reduce the disturbance from human activity, noise, and climate that can be encountered in the daytime.

The preprocessing of the source data is as follows:

1. Fill in missing data. If data are missing for 1–2 days, data from the previous day are used to fill the missing values. Otherwise the annual average is used as the missing value.

2. The year of data is unified, that is, the first 28 days in February are unified.

3. Removal of noise. When the data are less than some threshold value or are suspected of being unreasonable, the annual average is used in place of the noisy data.

Table 1  Metadata summary of “Abnormal OLR signals in Nepal from 2009–2018 metadata”

Items

Description

Dataset full name

Dataset of OLR abnormal signals in Nepal from 2009 to 2018

Dataset short name

OLRAbnormalSignalNepal_2009-2018

Authors

Lin, L. AAB-6198-2019, College of Mathematics and Informatics, Fujian Normal University, linling@fjnu.edu.cn

Kong, X. Z. AAI-1869-2019, College of Mathematics and Informatics, Fujian Normal University, xzkong_fjnu@163.com

Li, L. AAB-3416-2020, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, 13509338919@qq.com

 

Jiang, X. Y. AAI-1865-2019, College of Mathematics and Informatics, Fujian Normal University, 13509338919@qq.com

Geographicalregion

Nepal region             Year  2009–2018          Temporal resolution   day

Spatial resolution

2.5º×2.5º                Data format   .xls         Data size   3.9M

Data files

The dataset includes 10 years’ data and 25 grid overlays centered on the epicenter of the Mw7.8 earthquake in Nepal on April 25, 2015. The relative grid coordinates are (23, 32) to (27, 36), which are the source and result data of abnormal signal analysis of OLR data during the 10years from 2009–2108. They mainly include: 10 sheets representing 10 years’data, each composed of 25 grids of data, and the data of each grid are divided into source data (OLR_Raw), preprocessed data (OLR_Prep), and result data (CD-Value) after extraction of abnormal information.

Foundation(s)

Province Natural Science Foundation of Fujian (2019Y0008), National Natural Science Foundation of China (6177200441601477)

Data publisher

Global Change Research Data Publishing & Repository, http://www.geodoi.ac.cn

Address

No. 11A, Datun Road, Chaoyang District, Beijing 100101, China

Data sharing policy

Data from the Global Change Research Data Publishing & Repository include metadata, datasets (data products), and publications (in this case, in the Journal ofGlobal Change Data & Discovery). Datasharing policy includes: (1) Data are openly available and can be freely downloaded via the Internet; (2) End users are encouraged to use Data subject to citation; (3) Users, who are by definition also value-added service providers, are welcome to redistribute Data subject to written permission from the GCdataPR Editorial Office and the issuance of a Data redistribution license; and (4) If Data are used to compile new datasets, the ‘ten per cent principal’ should be followed such that Data records utilized should not surpass 10% of the new dataset contents, while sources should be clearly noted in suitable places in the new dataset[11].

Communication and searchable system

DOI, DCI, CSCD, WDS/ISC, GEOSS, China GEOSS

After the above pre-processing, the dataset corresponds to the value of the OLR_Prep column.

Finally, the OLR_Prep data are mined for abnormal signals, that is, the ADRM algorithm based on martingale theory is used to mine the changing characteristics and trends of the OLR_Prep data[12–14] to generate a new Change Detection (CD)-Value. The principle of the ADRM algorithm is as follows:

The OLR dataset is defined as  using known historical data, where  represents the OLR value. When the geological activity is stable, the OLR data should be relatively stable, with some similar characteristics between the sample data [15].

The outlier measure of the OLR data signal is determined as follows. The offset value of  is , where m is the clustering center of  obtained by a clustering algorithm and  denotes the distance measure function. The initial outlier sequence is . The confidence map of a random sensor for sequence  [16] is then given by

                                        (1)

This is mapped to the confidence space of , where the random value , and  is the number of samples that meet the given condition. It can be seen from equation (1) that larger values of  indicate that  is more consistent with the distribution of historical samples, and it is therefore less likely that  will be abnormal on that day.

A relatively small  value on one day is not enough to indicate that the overall OLR data are abnormal. Thus, the randomized  values corresponding to each data point  are analyzed[17] using the following formula:

                                                                                                     (2)

where , the initial value , and the set of the first 50 points is the initial center of the cluster (i.e., the first 50 CD_Value data in the set are the same). To smooth the noise that may appear in OLR data and reduce any misjudgment of the change trend, the M value is subjected to a  average smoothing processing to generate the CD sequence value:

                                                 .                                          (3)

Figure 1 Abnormal acquisition circuit diagram of OLR data

Due to the relatively violent crustal movement before and after a major earthquake, the OLR data may fluctuate within a short period of time. Although the valueof, which reflects the fluctuation, has been smoothed, it may still increase to an uncontrollable degree[18]. To prevent this from happening, we set a stop threshold. When , the calculation is stopped and re-initialized at the current position.

3.2 Technology Roadmap

The overall process of generating the dataset is shown in Figure 1.The NOAA[12] provides OLR remote sensing data from NOAA satellites. This article considers OLR data from 2009–2018. We digitize the regional data grid in the earthquake area, forming 25 grid cells for analysis. The OLR_Raw source data are preprocessed by inserting missing data, normalization, and noise reduction to obtain the OLR_Prep sequence, and then the abnormalities are analyzed using the ADRM algorithm to obtain the CD-Value data sequence.

4 Results and Validation        

4.1 Data Composition

The dataset is saved in an Excel file consisting of 10 sheets, with the sheet name representing the year of the data. Each sheet is composed of 25 sets of data, and each set of data represents the OLR source data, OLR preprocessed data, and CD-Value in the corresponding grid. The format of the column name “Grid No.*(**, **)” represents the grid number and corresponding coordinates.For example, the coordinates of grid cell 1 are (23, 32) and the column name in the sheet is “grid No.1 (23, 32)”.

Table 2  Attribute description of each column in the dataset

Attribute

Description

Note

OLR_Raw

Raw source data from NOAA.

NCAR and NOAA. Available: ftp ftp.cpc.ncep.noaa.gov; cd precip/ noaa18_olr for OLR data

OLR_Prep

Pre-processed data (following removal of invalid and noisydata).

CD-Value

Abnormal information dataset.

Result of data analysis using ADRM algorithm.

4.2 Data Products

This dataset organizes data from both a geographic and temporal perspective. In terms of time, each year is defined as running from September 28 of one year to September 28 of the next year, a total of 366 days (after preprocessing). Thus, a total of 366 rows of data are stored in a sheet. In the region, the Nepal earthquake epicenter (28.23ºN, 84.73ºE) is used as the center, and the neighboring areas are gridded to cover 2.5º latitude and 2.5º longitude. The epicenter grid coordinate is (25, 34). The gridis formed by extendingin the longitude and latitude directions to form a grid square. For example, the corresponding latitude and longitude range of Grid No.1, which has coordinates (23, 32), are (31.98°N–34.48°N, 78.48°E–80.98°E).

4.3 Data Validation

This dataset contains OLR anomaly data for 25 grid cells covering a period of 10 years. We use grid cell 13 as an example to illustrate the effectiveness of the algorithm. Figure 2 shows the data map for one year (September 28, 2014 to July 25, 2015). Figure 2(a) shows the fluctuation chart of the OLR raw data of NOAA satellites. It is difficult to observe the data with the naked eye or through simple data analysis. Figure 2(b) shows the CD_Value waveform of the abnormal changes calculated by the ADRM algorithm, and the data change trend chart after effectively extracting the abnormal information. The three vertical lines in the figure represent the three earthquakes with magnitudes of 5.0, 7.8, and 7.3 that struck in Nepal on December 18, 2014, April 25, 2015, and May 12, 2015, respectively. The source of the seismic information is the network data provided by the US Geological Survey (USGS)[19].

It can be seen from the figure that the three earthquakes are consistent with changes in CD-Value in terms of time and the magnitude of the event. The magnitude of the earthquake on May 12, 2015 was smaller, but the CD-Value displays a sudden increase in the change curve. The value suddenly increased because of the drastic changes in data on April 25, 2014, making subsequent data mining changes more “sensitive.”

The OLR anomalies occurred about a month before the earthquake. The December 18 earthquake in Figure 2(b) began to produce anomalous OLR changes on November 20, and the April 25 earthquake gave rise to anomalous readings as early as February 25. The change in the anomalous CD-Values ​​began to appear soon after, and over time, the CD-Value tended to climb, with the general trend continuously fluctuating and rising, before reachinga peak on the day of the magnitude-7.9 earthquake, April 25. Although the curve declined thereafter, it then started to rise until May 12, when Nepal experienced another large aftershock. As shown in Figure 2(b), CD-Value rose sharply, and a peak appeared the day before the earthquake on May 12 (marked by the third vertical line in the middle). After that, the CD-Value quickly dropped, and although there were a number of small aftershocks in Nepal, there were no more large earthquakes. This shows that studying the CD-Value sequence of OLR anomaly information can provide a reference for earthquake prediction.

 

 

Figure 2  Comparison of OLR source data and CD-Value

 

By comparing the 10-year mean of the CD-Value with the 2015 CD chart, the pre-seismic anomalies can also be analyzed. In Figure 3, the red triangles represent three earthquakes, and the yellow curve shows the CD-Value in 2015. The blue curve represents the average CD-Value in grid cell 13 over the 10years from 2009–2018 covered by the dataset. Through comparisons with the mean value, it can be found that the fluctuations in CD-Value before the three earthquakes are all greater than the mean value. Similarly, the value began to exceed the average value around November 20, and the CD-Value reached its peak on December 16,2days before the earthquake. For the two subsequent earthquakes, CD captured the anomaly, and its value greatly exceeded the mean. After the earthquake, the CD-Value fell, which coincided with the time of the earthquake.

 

Figure 3  Comparison of the 10-year mean and the 2015 of the CD value

In spatial terms, we can apply the CD-Value to further research the relationship between the anomalousOLR signals and the area. Firstly, the CD-Values of the 25 grid cells in 2015 were calculated as the unit mean over every 5-day period for about 30 days before the earthquake and 15 days after the earthquake. Finally, the average histogram shown in Figure 4 was obtained, where each column corresponds to the mean CD value and the red vertical line denotes the time of the earthquake on May 12, 2014. Analysis suggests that grid cells 11, 12, 16, and 17 in the western vicinity of the epicenter (grid 13) showed obvious data anomalies, as shown in Figure 4, and all had large CD-Values about one month before the earthquake. Grid cell 13 (the epicenter) exhibits extreme changes.

 

 

Figure 4  Analysis of CD value of regional correlation pre-seismic anomalies

Comparing the grid with the corresponding geographic location, the middle horizontal line of the grid in Figure 4 is located on the Mediterranean–Himalaya seismic zone, which is the boundary of ​​the Eurasian plate with the African plate and the Indian Ocean plate. In Figure 4, there is a clear signal change trend along the middle horizontal axis of the seismic zone, and the signal characteristics are obvious in the lower half, which is consistent with the squeeze situation in the Indian plate. Compared with the trend in mean CD in other grid cells, the number of cylinders in which the mean CD reaches or exceeds 200 is greatest in the seismic zone. In particular, the mean CD of grid cells 11, 12, and 13 is generally around 200, indicating abnormal changes. The anomalous features of grid cells 12 and 13 are particularly obvious, which conforms to the behavior of the epicenter and the regional characteristics of the earthquake zone [13].

5 Discussion and Conclusion

The dataset described in this paper is based on the 2009–2018 OLR source data from a rectangular area of Nepal centered on 28.23°N, 84.73°E. The raw data were subjected to the ADRM algorithm for anomaly signal mining. For 25 grid cells, a three-dimensional matrix sequence dataset with dimensions of [366, 25, 10] was formed. Nepal is located in the Mediterranean–Himalayan earthquake zone. This seismic zone is the junction of the Eurasian plate with the African plate and the Indian Ocean plate. Its seismic activity accounts for 24% of the total energy released by global earthquakes. Thus, taking Nepal as an example to study the relationship between earthquakes and signals is of great exploration value.

The dataset produced in this work providesthe basis for studying the correlation between OLR data and the occurrence of major earthquakes. Data mining methods are used to extract useful components from the signal, and these are used for the subsequent research of earthquake prediction and even the analysis of various other signals Anomaly extraction provides further research ideas and database libraries.

Author Contributions

L.L. designed the algorithms for the dataset and wrote the data paper.X.Y.J.contributed to the data processing and collection. X.Z.K. contributed to the model design and algorithm.N.L. contributed to the data analysis and verification.

 

References

[1]     Liu, D. F., Kang, C. L. Predicting heavy disasters by outgoing longwave radiation (OLR) of the earth. Earth Science Frontiers, 2003, 10(2): 427435.

[2]     Kong, X. Z., Bi, Y. X., Glass, D. Detecting seismic anomalies in outgoing Long-Wave Radiation data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2014, 8(2): 649660.

[3]     Guo, X., Zhang, Y. S., Wei, C. X., et al. Medium Wave Infrared Brightness Anomalies of Wenchuan 8.0 and Zhongba 6.8 Earthquakes. ActaGeoscienticaSinica, 2014(3): 338344.

[4]    Lin, L., Kong, X., & Li, N. (2019, October). A martingale-based temporal analysis of pre-earthquake anomalies at Jiuzhaigou, China, in the period of 2009-2018. In E3S Web of Conferences (Vol. 131, p. 01072). EDP Sciences.

[5]     Saraf, A. K., Choudhury, S. Cover: NOAA-AVHRR detects thermal anomaly associated with the 26 January 2001 Bhuj earthquake, Gujarat, India. International Journal of Remote Sensing, 2005, 26(6): 10651073.

[6]     Ouzounov, D., Bryant, N., Logan, T., et al. Satellite thermal IR phenomena associated with some of the major earthquakes in 1999–2003. Physics and Chemistry of the Earth, 2006, 31(4): 154163.

[7]     Tramutoli, V., Cuomo, V., Filizzola, C., et al. Assessing the potential of thermal infrared satellite surveys for monitoring seismically active areas: The case of Kocaeli (İzmit) earthquake, August 17, 1999. Remote Sensing of Environment, 2005, 96(3): 409426.

[8]     Selva, J., Marzocchi, W., Papale, P., et al. Operational eruption forecasting at high-risk volcanoes: the case of CampiFlegrei, Naples. Journal of Applied Volcanology, 2012, 1(1): 5.

[9]     Xiong, P., Bi, Y. X., Shen, X. H. Study of Outgoing Longwave Radiation Anomalies Associated with Two Earthquakes in China Using Wavelet Maxima. HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems, 2009: 7787.

[10]  Lin, L., Kong, X. Z., Li, N.Dataset of OLR abnormal signals in Nepal from 2009 to 2018 [DB/OL]. Global Change Data Repository, 2019. DOI:10.3974/geodb.2019.05.11.V1.

[11] GCdataPR Editorial Office. GCdataPR Data Sharing Policy [OL]. DOI: 10.3974/dp.policy.2014.05 (Updated 2017).

[12]  U.S. Department of Commerce. National Oceanic and Atmospheric Administration[DB/OL]. ftp://ftp.cpc.necp. noaa.­gov/precip/noaa18_olr.

[13]  Lin, L., Kong, X. Z., Li, N. Pre-earthquake anomaly datamining of remote sensing OLR in Nepal earthquake. Journal of Geo-information Science, 2018, 20(8): 11691177.

[14]  Intelligence, M., Wechsler, H. A Martingale Framework for Detecting Changes in Data Streams by Testing Exchangeability. IEEE Transactions on Pattern Analysisand Machine Intelligence, 2010, 32(12): 21132127.

[15]  Kong, X., Li, N., Lin, L., et al. Relationship of Stress Changes and Anomalies in OLR Data of the Wenchuan and Lushan Earthquakes. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(8): 29662976. doi:10.1109/JSTARS.2018.2839089.

[16]  Kong X., Bi Y., Glass D.H. Detecting seismic anomalies in outgoing long-wave radiation data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(2): 649660. doi:10.1109/JSTARS. 2014.2363473

[17]  Molchan, G., Romashkova, L., Peresan A. On some methods for assessing earthquake predictions. Geophysical Journal International, 2017, 210(3): 14741480.

[18]  Li N., Kong X., Lin L. Anomalies in continuous GPS data as precursors of 15 large earthquakes in Western North America during 2007-2016. Earth Science Informatics, 2019: 112.

[19]  USGS. https://earthquake.usgs.gov/earthquakes/.

Co-Sponsors

Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences

The Geographical Society of China

Parteners

Committee on Data for Science and Technology (CODATA) Task Group on Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries (PASTD)

Jomo Kenyatta University of Agriculture and Technology

Digital Linchao GeoMuseum