Fast Determination of Earthquake Depth Using Seismic Records of a Single Station, Implementing Machine Learning Techniques

Ochoa, Luis H.; Niño, Luis F.; Vargas, Carlos A.; Ochoa, Luis H.; Niño, Luis F.; Vargas, Carlos A.

doi:10.15446/ing.investig.v38n2.68407

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Ingeniería e Investigación

Print version ISSN 0120-5609

Ing. Investig. vol.38 no.2 Bogotá May/Aug. 2018

https://doi.org/10.15446/ing.investig.v38n2.68407

Original articles

Fast Determination of Earthquake Depth Using Seismic Records of a Single Station, Implementing Machine Learning Techniques

Determinación rápida de la profundidad de un terremoto utilizando registros sísmicos de solo una estación implementando técnicas de aprendizaje de máquina

Luis H. Ochoa¹

Luis F. Niño²

Carlos A. Vargas³

^¹ Civil Engineer, M.Sc. Geophysics, M.Sc. Geomatics, Ph.D. System Engineering, Universidad Nacional de Colombia, Colombia. Affiliation: Associated Professor at Sciences Faculty, Geosciences Department, Universidad Nacional de Colombia (Colombia). E-mail: lhochoag@unal.edu.co.

^² Systems Engineer, M.Sc. Mathematics, Universidad Nacional de Colombia, Colombia. M.Sc. and Ph.D. Computer Science, the University of Memphis, United States of America. Affiliation: Professor at Sciences Faculty, System Engineering Department, Universidad Nacional de Colombia (Colombia). E-mail: lfninov@unal.edu.co.

^³ Geologist, Universidad de Caldas, Colombia. M.Sc. and Ph.D. Seismic Engineering and Structural Dynamic, Universidad Politécnica de Catalunya, Spain. M.Sc. Physics Intrumentation, Universidad Tecnológica de Pereira, Colombia. Affiliation: Professor at Sciences Faculty, Geosciences Department, Universidad Nacional de Colombia (Colombia). E-mail: cavargasj@unal.edu.co.

ABSTRACT

The purpose of this research is to apply methods of support vector machines (SVMs) for fast determination of earthquake depths using seismic records of the "El Rosal" station, near to the city of Bogotá - Colombia. The algorithm was trained with time signal descriptors of 863 seismic events acquired between January 1998 and October 2008. Only earthquakes with magnitude > 2 M_L were contemplated, filtering its signals to remove diverse kind of noises not related to earth tremors. During training stages of SVM several combinations of kernel function exponent and complexity factor were considered for time signals of 5, 10 and 15 seconds along with earthquake magnitudes of 2.0, 2.5, 3.0 and 3.5 M_L. The best classification of SVM was obtained using time signals of 15 seconds and earthquake magnitudes of 3.5 M_L with kernel exponent of 10 and complexity factor of 2, showing accuracy of 0,6 ± 16,5 kilometers, which is good enough to be used in an early warning system for the city of Bogotá. It is recommended to provide this model with more recent seismic events in order to improve its accuracy.

Keywords: Earthquake early warning; rapid response; earthquake depth; seismic event; Bogotá - Colombia; support vector machine regression (SVMR); seismology; earthquakes

RESUMEN

El propósito de esta investigación es aplicar métodos de máquinas de vector de soporte (MVS) para determinar rápidamente las profundidades de terremotos utilizando registros sísmicos de la estación El Rosal, cerca de la ciudad de Bogotá - Colombia. El algoritmo fue entrenado con descriptores de señales de tiempo de 863 eventos sísmicos adquiridos entre enero de 1998 y octubre de 2008; solo se contemplaron terremotos de magnitudes > 2 M_L, filtrando sus señales para remover diversos tipos de ruidos no relacionados con temblores terrestres. Durante las etapas de entrenamiento de la MVS varias combinaciones del exponente de la función kernel y factor de complejidad fueron considerados para señales de tiempo de 5, 10 y 15 segundos junto con terremotos de magnitudes 2.0, 2.5, 3.0 y 3.5 M_L. La mejor clasificación de la MVS fue obtenida utilizando señales de tiempo de 15 segundos y terremotos de magnitudes 3.5 M_L con exponente kernel de 10 y factor de complejidad de 2, mostrando precisión de 0,6 ± 16,5 kilómetros, lo cual es suficientemente bueno para ser utilizado en un sistema de alerta temprana para la ciudad de Bogotá. Se recomienda proveer este modelo con eventos sísmicos recientes, con la finalidad de mejorar su precisión.

Palabras clave: Alerta temprana de terremoto; respuesta rápida; profundidad de un terremoto; evento sísmico; Bogotá - Colombia; máquina de soporte vectorial (MSV); sismología; terremotos

Introduction

This study is part of an investigation line which proposes calculation of earthquake hypocentral parameters by using machine learning techniques, in order to develop an early warning system for the city of Bogotá. Bogotá's Savannah and surrounding areas hold almost a third of Colombian population, being the main economic center of the country with almost 40% of the gross domestic product (Ojeda et al., 2002); this is why a seismic early warning system around Bogotá is so important, and the earthquake depth is one of the main parameter in this system.

The common way to calculate hypocentral parameters, including earthquake depth, consists to apply velocity models for different rock layers of the earth and processing travel time signals of P and S waves recorded in seismic stations (Zhang et al., 2014). In recent years, alternative approaches based on machine learning techniques have been developed, most of them using genetic algorithms (GA) and fuzzy logic (FL). The FL approaches allow efficient exploration of the searching space (^{Lin and Sanford, 2001}), while the GA are manly used to determine X, Y, Z coordinates of earthquake hypocenter (^{Sambrige & Gallagher, 1993}). Ochoa et al. (2014) and Ochoa et al. (2017) successfully applied methods of support vector machines (SVMs) on estimations of hypocentral parameters using only a few signal seconds registered at a single seismological station achieving reliable results.

The aim of this study consist to apply SVMs to figure the earthquakes depth out by using data acquired at the "El Rosal" seismic station. This station is located northwest to Bogotá and it is part of the Colombian seismic network (Figure 1) administrated by the "Servicio Geológico Colombiano - SGC" (Colombian Geological Service). The earthquakes depth distributions around Bogotá are also shown in Figure 1, suggesting high seismicity in surrounding areas.

Source: Authors

Figure 1 Distribution of Seismic Events Around Bogotá. Plane coordinates Gauss Bogotá Origin.

"El Rosal" Station and Dataset Used

The "El Rosal" station employs a Guralp CMG - T3E007 sensor in three components and a nanometrics RD3-HRD24 digitizer, which provides simultaneous sampling of the three channels with 24-bit of resolution (^{Bermudez & Rengifo, 2002}). The used data correspond to three component raw waveforms recorded in the station and its seismic not declustered catalogue of 2164 events, embracing between January 1^st of 1998 and October 27^th of 2008; all these events located less than 120 kilometers from the seismic station (Figure 1).

Before starting the processing related to SVMs, the waveform files were converted to the American standard code for information interchange (ASCII) format, using a Seisan package tool (^{Ottemoller et al., 2016}). Because of earthquakes with magnitudes lower than 2.0 M_L can be related to man-made tremors, these events along with all extreme or anomalous values were removed, reducing the data set to 863 events. Since the selected seismic records present variable levels of noise, it was necessary to filter them out with both high and low frequency filters. Low frequencies correspond to instrumental noise, easily eliminated through implementation of a high-pass filter with cutoff frequency of 0,075 Hz, and high frequencies were removed with a low-pass filter with cutoff of 150 Hz (Wu & Zhao, 2006).

The statistical distribution of earthquakes depth in the area is presented in Figure 2. This bar chart shows a bimodal pattern, suggesting two levels of seismic sources; the first around 15 kilometers and the second around 150 kilometers of depth. The seismicity 15 kilometers of depth is one of the main interests in early warning systems; because of these shallow earthquakes usually produce great damage to near populations.

Source : Authors

Figure 2 Statistical Distribution of Earthquakes Depth.

Support Vector Machines (SVMs)

The SVMs are a group of supervised learning algorithms related to classification and regression problems. When a sample is given to train, it can be separated into classes and train a SVM model to predict the classes within a new sample; a SVM represents the points of a sample in space, separating classes within these points in a widest possible spaces. When new samples are projected in this model, they can be classified to any class in function of proximity of the points. The model of SVM applied in this research is based on its complexity factor "C" and the kernel function selected. The complexity factor regulates accuracy of the model; this factor can lead the model trains properly (generalization), or else, it can reach a point of overfitting. A proper generalization refers the model ability to classify accurately several samples different than the employed during training stage; moreover, overfitting occurs when the model can only classify correctly the sample used during training.

The Kernel function project a dataset in a space of specific characteristics and uses algorithms related to linear algebra, geometry and statistics to identify linear patterns in the dataset. Any solution using kernel methods comprises two phases; first phase consists of a module that performs a mapping of the projected data; second phase contains an algorithm designed to detect linear patterns in the space where this data is projected (^{Taylor & Cristianini, 2004}). The function kernel applied in this study was a polynomial type using the Equation (1).

Where "E" is a parameter representing the kernel exponent and (K) represents the kernel function depending on variables (x) and (y).

The Input Data Set of the SVM

On the first stage, parameters that have been previously used for other authors to earthquake magnitude estimation were calculated and employed as input variables or descriptors for the SVM of this research. In this sense, the relationship between maximum P wave amplitudes and local earthquake magnitudes was considered (^{Wu & Kanamori, 2005}), where a linear regression was performed for each one of the three components. Three parameters were taken from this linear regressions which correspond to slope (M), independent term (B) and correlation coefficient (R). The maximum amplitude values (Mx) obtained for each component's time window were used as descriptors as well. Therefore, each event had 12 descriptors associated with this concept.

In second place, 9 descriptors used for epicenter distance estimation were added, by adjusting a linear regression of an exponential expression in time (t) using the Equation (2).

This equation belongs to the envelope of the seismic record in a logarithmic scale (Odaka et al., 2003) determined also by a linear regression and its respective correlation coefficient (R), for each of component in the seismic station. The correlation coefficient (R) along with the parameters (A) and (B) were calculated for each component of the station; where (B) represents the slope of initial part of P waves and (A) is a parameter related to amplitude variations in time.

Finally, parameters used for back-azimuth determination were used to include information about sources location of the seismic events into the model. Maximum eigenvalues of two-dimensional covariance matrix were used as input, calculated as described in Magotra et al. (1987) and Magotra et al. (1989). A windowing scheme with one second time windows was performed, to obtain consecutive values for which a linear regression was calculated, also determining the slope (M), the independent term (B), the correlation factor (R) of the regression, and this time with addition of the arithmetic mean of the eigenvalues (P). This last processing works with all components of the station together, so only 4 descriptors were added as input related to this process.

In summary, the SVM of this study employs 25 time signal descriptors as input data (Table 1); 12 of them related to works on magnitude calculation, 9 were associated with epicenter distance estimations and the last 4 were used in back-azimuth determination. These descriptors were calculated for 5, 10 and 15 seconds signal of the 863 selected events.

Table 1 Summary Of Descriptors Employ as Input data of the SVM

Source: Authors

Figure 3 Relationship between real and calculated depth.

Results

Using the descriptors explained before and real magnitudes for each considered seismic event, a group of 12 datasets were evaluated in order to find the best combination of cutoff magnitude and time signal length (Table 2). Each dataset corresponds to 4 magnitudes (2.0, 2.5, 3.0 and 3.5 M_L) and 3 time signal length (5, 10 and 15 s), along with 7 values of kernel exponent "E" and 6 values of complexity factor "C", summarizing 504 tested models of SVM. This processing was developed using the Weka 3.6 software (^{Eibe et al., 2016}); this algorithm has a strong statistical support and is easily implemented on the station by electronic processing cards. The free parameter "C" and "E" of the SVM were randomly considered in this step, and they were precisely selected after the best combination of magnitude and time signal length was obtained.

Table 2 Cutoff Magnitude and Time Length Combinations

Surce: Authors

The Pearson's coefficient was calculated for each partition, measuring linear relationship between two variables independently of their scales (Table 2). Positive values mean that two variables change in the same way, i.e. high values of one variable correspond to high values of the other and vice-versa. The closer is this value to one, the greater certainty that two variables have linear relation. It can be observed that best values of Pearson's coefficient are 0.857 and 0.898 for time signals of 10 and 15 seconds respectively with magnitude > 3.5 M_L.

Table 3 shows low standard deviation but high kurtosis for time signal of 10 seconds, indicating that dispersion in residual values is too high and not recommended for depth determination; instead, it is better to choose a time signal of 15 seconds which has standard deviation of 16.5 kilometers and lower kurtosis for a magnitude of 3.5. Once the best combination of magnitude and time signal is obtained, the free parameters "C" and "E" are chosen using the Table 4, where Pearson's coefficient and mean absolute error are presented for each combination. Based on these results, a SVM can be implemented using "E" of 10 and "C" of 2, indicating a reliable model with accuracy of 0,6 ± 16,5 kilometers in earthquake depth determination. Figure 3 shows the cross-plot with relationship between the real earthquake depth (X axis) and the calculated depth by the model (Y axis). The dashed line represents the linear behavior of predicted data, corresponding to the locus where prediction is equal to real values.

Table 3 Statistical Summary for Best Models in Each Combination

Source: Authors

Table 4 Pearson's Coefficient and Mean Absolute Error Used to Determent "E" and "C"

Source: Authors

Conclusions

The model proposed in this study is an important step towards implementation of an early earthquake warning system for the city of Bogotá - Colombia, and its results can supply reliable information to this system using data of the "El Rosal" station in only 15 seconds before arrival of an earthquake. The accuracy of this study is as high as 0,6 ± 16,5 kilometers in earthquake depth determination, employing a support vector machine with complexity factor of 2 and a polynomial function kernel with exponent of 10.

This model can be implemented in seismological stations directly in the electronic devices embedded at each station, where the main mathematical process corresponds to a simple matrix product involving the given kernel and a vector which contains calculated descriptors of the seismic events.

The results showed in this study are in the same order than (^{Hsiao, 2011}), who reported accuracy of 7,9 ± 6,6 kilometers using data of 5 stations in earthquake depth determination. Also these results are an improvement on that of ^{Romeu (2016)} who was as accurate as ± 23 kilometers.

Recommendations

According to Figure 2, this model may be improved by applying two different SVM models; the first one for shallow earthquakes with depth of 15 kilometers and the second for deeper earthquakes with 150 kilometers of depth.

It is important to find ways to improve the prediction accuracy based on further research, supported by computational intelligence and geophysics research groups as well as the seismological network of the Bogotá's Savannah and its surroundings administrated by the Universidad Nacional de Colombia. Additionally, datasets should be complemented with recent seismic events, not considered in this research.

Acknowledgements

The authors are grateful to the "Servicio Geológico Colombiano" (SGC) for providing the data used in this research and to the "Universidad Nacional de Colombia" for supporting our efforts to achieve a fast and reliable early warning system for the city of Bogotá - Colombia.

References

Bermudez, M., & Rengifo, F. (2002). EL ROSAL: La Estación Sismológica del CTBTO en Colombia. (8). Bogotá: Primer Simposio Colombiano de Sismología. [ Links ]

Eibe, F., Hall, M., & Witten, I. (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques (Fourth Edition e Ed.). Morgan Kaufmann. [ Links ]

Hsiao, N. (2011). A new prototype system for earthquake early warning system in Taiwan. soil Dynamic and Earthquake Engineering, 201-208. https://doi.org/10.1016/j.soildyn.2010.01.008 [ Links ]

Lin, K.-W., & Sanford, A. (2001). Improving Regional Earthquake Locations Using Modified G Matrix and Fussy Logic. Bulletin of the Seismological Society of America, 82-93. [ Links ]

Magotra, N., Ahmed, N., & Chael, E. (1987, June). Seismic event detection and source location using single station (three component) data. Bull. Seism. Asoc. Am., 77(3),958-971. [ Links ]

Magotra, N., Ahmed, N., & Chael, E. (1989, January). Single-station seismic event detection and location. IEEE Transactions on Geoscience and Remote Sensing, 27(1), 15-23. 10.1109/36.20270 [ Links ]

Ochoa, L. H., Niño, L. F., & Vargas, C. A. (2017). Fast magnitude determination using a single seismological station record implementing machine learning techniques. Sciences Direct, Geodesy and Geodynamic, 1-8. https://doi.org/10.1016/j.geog.2017.03.010 [ Links ]

Ochoa, L., Niño, L., & Vargas, C. (2014, December). Severity Classification of a Seismic Event based on the Magnitude-Distance Ratio Using Only One Seismological Station. Earth Sciences Research Journal, 18(2), 115-122. https://doi.org/10.15446/esrj.v18n2.41083 [ Links ]

Odaka, T., Ashiya, K., Tsukada, S., Sato, S., Ohtake, K., & Nozaka, D. (2003, February). A new method for quickly estimating epicentral distance and magnitude from a single seismic record. Bull. Seism. Soc. Am., 93(1), 526-532. 10.1785/0120020008 [ Links ]

Ojeda, A., Martinez, S., Bermudez, M., & Atakan, K. (2002, October). The new accelerograph network for Santa Fe De Bogota, Colombia. Soil Dynamics and Earthquake Engineering, 791-797. https://doi.org/10.1016/S0267-7261(02)00100-8 [ Links ]

Ottemoller, L., Voss, P., & Havskov, J. (2016). SEISAN EARTHQUAKE ANALYSIS SOFTWARE FOR WINDOWS, SOLARIS, LINUX and MACOSX. [ Links ]

Romeu, P. (2016). Development of an Early Warning System Based on Earthworm: Application to Southwest Iberia. Bulletin of Seismological Socierty of America, 1-12. https://doi.org/10.1785/0120150192 [ Links ]

Sambrige, M., & Gallagher, K. (1993). Earthquake hypocenter locations using genetic algorithms. Bulleting of the Seismo-logical Society of America, 1467-1491. [ Links ]

Taylor, J., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge, United Kingdom: Cambridge University Press. [ Links ]

Wu, Y. M., & Zhao, L. (2006, August). Magnitude estimation using the first three seconds P-wave amplitude in earthquake early warning. Geophysics. Res. Lett., 33(16), L16312. 10.1029/2006GL026871 [ Links ]

Wu, Y., & Kanamori, H. (2005). Experiment on an onsite early warning method for the Taiwan early warning system. Bulletin of the Seismoligical Society of America, 347-353. https://doi.org/10.1785/0120040193 [ Links ]

Zhang, M., Tian, D., & Wen, L. (2014, February 25). A new method for earthquake depth determination: stacking multiple-station autocorrelograms. (A. Access, Ed.) Geophysical Journal International (197), 1107-1116. 10.1093/gji/ggu044 [ Links ]

How to cite: Ochoa, Luis H., Niño, Luis F., Vargas, Carlos A. Fast Determination of Earthquake Depth Using Seismic Records of a Single Station, implementing Machine Learning Techniques., Ingeniería e Investigación, 38(2), 91-103.

Received: October 19, 2017; Accepted: January 18, 2018

This is an open-access article distributed under the terms of the Creative Commons Attribution License

Services on Demand

Journal

Article

Indicators

Related links

Share

Ingeniería e Investigación

Print version ISSN 0120-5609

Ing. Investig. vol.38 no.2 Bogotá May/Aug. 2018

https://doi.org/10.15446/ing.investig.v38n2.68407