Fast estimation of earthquake epicenter distance using a single seismological station with machine learning techniques

Ochoa, Luis H.; Niño, Luis F.; Vargas, Carlos A.; Ochoa, Luis H.; Niño, Luis F.; Vargas, Carlos A.

doi:10.15446/dyna.v85n204.68408

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

DYNA

Print version ISSN 0012-7353

Dyna rev.fac.nac.minas vol.85 no.204 Medellín Jan./Mar. 2018

https://doi.org/10.15446/dyna.v85n204.68408

Artículos

Fast estimation of earthquake epicenter distance using a single seismological station with machine learning techniques

Estimación rápida de la distancia epicentral de un terremoto utilizando registros de una sola estación sismológica, mediante técnicas de aprendizaje de máquinas

Luis H. Ochoa^a

Luis F. Niño^a

Carlos A. Vargas^a

^{^a} Universidad Nacional de Colombia, Bogotá, Colombia. lhochoag@unal.edu.co, lfninov@unal.edu.co, cavargasj@unal.edu.co

Abstract

A Support Vector Machine Regression (SVMR) algorithm was applied to calculate the epicenter distance using a ten seconds signal, after primary waves arrive at a seismological station near to Bogota - Colombia. This algorithm was tested with 863 records of earthquakes, where the input parameters were an exponential function of waveform envelope estimated by least squares and maximum value of recorded waveforms for each component of the seismic station. Cross validation was applied to normalized polynomial kernel functions, obtaining mean absolute error for different exponents and complexity parameters. The epicenter distance was estimated with 10.3 kilometers of absolute error, improving the results previously obtained for this hypocentral parameter. The proposed algorithm is easy to implement in hardware and can be employed directly in the field, generating fast decisions at seismological control centers increasing the possibilities of effective reactions.

Keywords: earthquake early warning; support vector machine regression; earthquake; rapid response; epicenter distance; seismic event; seismology; Bogota - Colombia

Resumen

Se aplicó un algoritmo de máquinas de vector de soporte para calcular la distancia epicentral utilizando una señal de diez segundos, después del arribo de ondas primarias a una estación sismológica cercana a Bogotá - Colombia. Este algoritmo fue probado con 863 registros de terremotos donde los parámetros de entrada fueron una función exponencial de la envolvente estimada para los mínimos cuadrados y el valor máximo de las formas de ondas registradas en cada componente de la estación sísmica. Validación cruzada fue aplicada a funciones kernel polinomiales normalizadas, obteniendo la media del error absoluto para diferentes exponentes y parámetros de complejidad. La distancia epicentral se estimó con 10.3 kilómetros de error absoluto, mejorando los resultados previamente obtenidos para este parámetro hipocentral. El algoritmo propuesto es fácil de implementar y puede ser empleado directamente en campo, generando decisiones rápidas en centros de control sismológico incrementado posibilidades de tener reacciones efectivas.

Palabras clave: alerta temprana de terremotos; máquinas de soporte vectorial; terremoto; respuesta rápida; distancia epicentral; evento sísmico; sismología; Bogotá - Colombia

1. Introduction

Bogota’s Savannah and surrounding areas are home to nearly a third of Colombia’s population and are the country’s main economic center with around 40% of the gross domestic product [¹]. In case of a destructive seismic event in this area, the entire country would face many harmful social and economic effects; this is why a seismic early warning system around Bogota is important and the epicenter distance estimation is one of the main parameters in this system. The epicenter distance represents the length between the earthquake epicenter and the seismological station, and epicenter is the surface area vertically above the earthquake focus [²]. The density of seismological stations around Bogota is not high enough, making the time used for the localization of seismic events longer than the travel time to areas where the early warning is required. In this case, an alternative solution may be implemented, by using seismological data of previous events recorded at one single station in order to calculate earthquake hypocentral parameters. A seismic early warning system emits an alert from a few seconds to a few tens of seconds before the stronger shaking movement arrives; it can be based on one three-component station, implementing methods of bio-inspired computing, natural computation or computational intelligence. These methods have been successfully applied in multiple areas of knowledge; in seismology, these methods allow the estimation of hypocentral parameters using just few seconds of a signal registered at a single seismological station, achieving acceptable accuracy and generating reliable alerts. This approach is very useful in areas with sparse seismic networks [³,⁴]. Automatic computation algorithms in a single broadband three-component station have been mainly developed for P and S waves onsets detection, allowing the estimation of source location using the back-azimuth and the apparent surface speed measurements [⁵-⁷], or seismic moment estimation [⁸-¹³]. On the other hand, kernel-based methods have become a very powerful tool for mathematicians, scientists and engineers, providing a very rich and surprising solution in areas such as signal processing and pattern recognition [¹⁴]. Its implementation is quite simple by applying a function that combines input variables as a combination of themselves using a function of dot products, obtaining an enhanced new space with more dimensions, mapping variables in a hyperspace where separation of classes (in the case of classification) by a linear function or hyperplane can be achieved.

The study area corresponds to the Bogota’s Savannah and its surroundings, where some important fault systems are present, such as Piedemonte Llanero, La Salina, Bogota Savannah and even the Ibague fault system Fig. 1.

Source: The authors.

Figure 1 Sketch of Bogota’s Savannah and fault systems.

Also, the Bogota city has been built on soft lacustrine soil [¹], which is a natural seismic wave amplifier producing high damage to the infrastructures, similarly to damages occurred in Mexico City in the past [¹⁵]. This area suffers high seismicity that can affect Bogota, the country’s capital and most important social and economic populated center.

2. Data set used and methods applied

The data set used in this research belongs to El Rosal seismological station, located toward north-west Bogota as shows Fig. 2. This station is part of the Colombian Seismic Network administrated by The “Servicio Geológico Colombiano - SGC” (Colombian Geological Service).

The Colombian Geological Survey has a main network composed by 42 stations transmitting in real time, recording seismic activity for the entire country as shows Fig. 3, with an average distance between stations of 162 kilometers. El Rosal station employs a brand Guralp CMG - T3E007 sensor in the three components and a nanometrics RD3-HRD24 digitizer, which provides simultaneous sampling of three channels with a 24-bit resolution [¹⁶]. The data correspond to the three component raw waveforms recorded directly in this station and a seismic catalogue with 2164 characterized events, selected between January 1^st 1998 and October 27^th 2008; all these events were located less than 120 kilometers from the station.

Source: The authors.

Figure 2 Distribution of seismic events around Bogota.

Source: Modified from Google Earth.

Figure 3 Seismological Colombian Network.

2.1. Data pre-processing

Before starting the processing related to SVMR, waveform files from El Rosal station were converted to the American standard code for information interchange (ASCII) format, using a Seisan package tool [¹⁷]; earthquakes with magnitudes lower than 2.0 M L were ignored; therefore the followed processes were applied on the remaining 1011 events. Since the selected seismic records present variable levels of noise, it was necessary to filter them out with both high and low frequency filters. Low frequencies correspond to instrumental noise that can be easily eliminated through the implementation of a high-pass filter with a cut-off frequency of 0.075 Hz [¹⁸], while high frequencies were removed with a low-pass filter with a cut-off frequency of 150 Hz.

The statistical distribution of epicenter distance values is presented in Fig. 4, where main distribution of whole data set is observed. This bar chart shows that the seismicity surrounding El Rosal station is presented from 40 kilometers and beyond, with great amount of events close to 90 kilometers. Although this is not a homogeneous distribution, it corresponds to local conditions and regular behavior for this variable and therefore the model has to work properly under this condition.

Source: The authors.

Figure 4 Statistical distribution of epicenter distance.

2.2. Descriptors - Input data set of SVMR

In this study, some parameters that have been previously used for other authors to magnitude estimation were calculated; then, they were employed as input variables or descriptors for the SVMR algorithm. In the first stage, the relationship between maximum amplitude of the wave in a short period of time was selected, along with local magnitude of the earthquake [¹⁹]. Consecutive, maximum peaks were highlighted and a linear regression was performed for each of the three components, correlating not only the maximum peak, but also the way it changes while energy reaches the sensors. Three basic parameters were chosen from the linear regressions such as, slope (M), independent term (B) and correlation coefficient (R), for each of the three components. The maximum amplitude values (Mx) obtained for the time of each component were used as descriptors as well. Thus, each event had 12 descriptors associated with this concept.

Furthermore, nine descriptors used previously for epicenter distance estimation were added. These descriptors were performed by adjusting a linear regression of an exponential in eq. (1):

This equation belongs to the envelope of the seismic record in a logarithmic scale determined by linear regression and its respective correlation coefficient (R) for each component [¹⁰].

Similarly, some parameters used for previous back-azimuth determination were used to include information about the source location of the seismic event into the model. Maximum eigenvalues of the two-dimensional covariance matrix were used as input descriptors, calculated as described in [⁵,²⁰]. A windowing scheme with one second time windows was performed, obtaining consecutive values for which a linear regression was calculated, in a similar way as described above, determining the slope (M), the independent term (B) and the correlation factor (R) of the regression, as well as the arithmetic mean of the eigenvalues (P). Despite this parameter involving three components, only four descriptors were added to the process.

In sum, a total of 25 descriptors were used in the SVMR model for local epicenter distance determination, 12 of them related to previous works on magnitude estimation, 9 were associated with previous epicenter distance estimations and the last 4 descriptors were used in back-azimuth determination. These descriptors were calculated for 5, 10 and 15 seconds signal windows for 1011 selected events with magnitudes greater than 2.0. All extreme or anomalous values were eliminated reducing the data set to 863 events.

2.3. The SVMR model

The model was trained with the refined data set for each time window, using the Weka 3.6 software [²¹]. This algorithm has a strong statistical support and is easily implemented on the station by electronic processing cards. After performing several tests, a standard normalized polynomial kernel was selected. In order to choose the kernel exponent and the complexity factor, correlation factors and minimum absolute error obtained by a 10 fold cross correlation process were compared. These processes were carried out testing multiple combinations of exponents and complexity factor for deferments magnitudes and time signals. The correlation coefficient calculated for each partition corresponds to the Pearson’s Coefficient, which measure the linear relationship between two variables independently of their scales. This coefficient takes values between 1 and - 1; a value of zero means that a linear relationship between two variables could not be found. A positive value of this relation means that two variables change in the same way, i.e. high values of one variable correspond to high values of the other and vice versa. The closer this value is to one, the greater certainty that two variables have a linear relation.

3. Results

Using the descriptors explained above and real magnitude for each considered seismic event, a group of 12 datasets were evaluated. Each dataset corresponds to a combination of 4 minimum magnitude filters (2.0, 2.5, 3.0 and 3.5) and 3 signal length filters (5, 10 and 15 s), evaluating combinations of 7 values for kernel exponent and 6 values for complexity factor, completing a total of 504 tested models of SVMR in order to find the combination of parameters with the best correlation factor in epicenter distance determination.

Fig. 5 shows values of correlation coefficients in each combination of cut-off magnitude and time signal where kernel exponents and complexity factors were calculated for each combination, where values in green squares represent better correlation, while those in red squares represent values of lower correlation. Afterward, those with the highest value of correlation coefficient were chosen. Fig. 5 also shows optimal values of correlation coefficients for time signals of 10 and 15 seconds, the average being of 0.7.

Source: The authors.

Figure 5 Correlation coefficients for each combination of Kernel.

This value is an acceptable correlation, indicating that the model is predicting the epicenter distance with good accuracy. For a time signal of 5 seconds, correlation coefficient values are just above 0.5, pointing out that this time signal is not long enough to estimate an acceptable epicenter distance.

Although a correlation coefficient is more suited to a time signal of 15 seconds, the 10-signal seconds was selected as the best model in the shortest period of time; moreover, a cut-off magnitude must be established for the 10 seconds signal allowing accuracy in the final value of epicenter distance.

The choice of parameters is shown in Fig. 6, where correlation coefficient and mean absolute error are presented for each combination of kernel exponent and complexity factor, all of them for the time signal and the cut-off magnitude selected.

Source: The authors.

Figure 6 Parameters selection for each dataset.

Table 1 shows, for a time signal of 10 seconds and 3.0 of cut-off magnitude, a mean value of 0.45 kilometers with standard deviation of 9 kilometers, this cut-off being the one finally implemented in the model.

Table 1 Summary for the best epicenter distance models in each combination.

Source: The authors.

These parameters were calculated using SVMR algorithm in Weka 3.6 with a standard normalized polynomial kernel and 10 fold cross-validations. We can see that quality factors enable an accuracy of 10.9 kilometers in epicenter distance, considering the cross-validation, and also allowing verification of the model at the same time.

Based on these results, a support vector machine of normalized polynomial kernel can be implemented using an exponent of 10 and a complexity factor of 0.8. The standard deviation was of 10.3 kilometers, which is good enough for an early warning generation, considering that most of the seismic events are located farther than 40 kilometers from El Rosal station.

Fig. 7 shows the cross-plot with relationship between the real epicenter distance (X axis) and the distance calculated by the model (Y axis). A normal statistical pattern can be observed in the distribution of residuals, also confirming that the calculated distance is of around 10.9 kilometers. The dashed blue line represents the linear behavior of predicted data, corresponding to the locus where prediction is equal to real values.

Source: The authors.

Figure 7 Correlation between real and calculated epicenter distance with SVMR.

From these results, we can establish that the model works properly, allowing prediction of the distance where earthquakes occur with good accuracy, considering that the values are obtained with a 10 seconds signal, which is good enough for an early warning system.

4. Conclusions

The SVMR model proposed here is an important step towards the implementation of an early warning system of earthquakes for the city of Bogota - Colombia, and for other populated centers in the world.

The result showed in this study is an improvement on that of [²²] and [²³], who were as accurate as ±15 and ±16-19 kilometers respectively.

This model is proposed and evaluated for fast epicenter distance determination, based on support vector machine regression through pattern recognition and characterization of earthquake signals recorded on a three components seismic station in only ten seconds, anticipating the arrival of earthquakes in the city of Bogota. An earthquake travels the distance to main seismic alignments in at least 30 seconds, allowing an early warning generation, which must be in less than 10 seconds. Additionally, this model can be implemented directly in the seismological station embedded in electronic devices, where the main mathematical process corresponds to a simple matrix product involving the given kernel for the epicenter distance and a vector which contains the calculated descriptors of the current event.

5. Recommendations

Despite the good results obtained in terms of epicenter distance determination, it is important to continue developing the model introduced here, in order to calculate other hypocentral parameters.

It is important to find ways to improve the prediction accuracy based on further research, supported by computational intelligence and geophysics research groups as well as the seismological network in Bogota’s Savannah and its surroundings managed by the Universidad Nacional de Colombia.

The use of other descriptors such as predominant period, Fourier and wavelet frequency spectra should be considered to obtain higher correlation factors and better estimation values for local magnitude and other hypocentral parameters estimations, required to generate a reliable and fast earthquake early warning system.

Datasets should be complemented with recent seismic events, specifically from October 27^th of 2008 to the present, as this period has presented larger set of earthquakes with magnitudes greater than 3.0.

Acknowledgments

The authors are grateful to the Servicio Geológico Colombiano (SGC) for providing the dataset used in this study and to Universidad Nacional de Colombia for supporting our efforts to achieve a fast and reliable early warning system for Bogota D.C. - Colombia.

References

[1] Ojeda, A., Martinez, S., Bermudez, M. and Atakan, K., The new accelerograph network for Santa Fé de Bogotá, Colombia. Soil Dynamics and Earthquake engineering, October - December, 22(9-12), pp. 791-797, 2002. DOI: 10.1016/S0267-7261(02)00100-8 [ Links ]

[2] Don, L. and Judson, S., Fundamentos de geología física. México: Editorial Limusa, 2000. [ Links ]

[3] Ochoa, L.H., Niño, L.F. and Vargas, C.A., Severity classification of a seismic event based on the magnitude-distance ratio using only one seismological station. Earth Sciences Research Journal, December, 18(2), pp. 115-122, 2014. DOI: 10.15446/esrj.v18n2.41083 [ Links ]

[4] Ochoa, L.H., Niño, L.F. and Vargas, C.A., Fast magnitude determination using a single seismological station record implementing machine learning techniques. Sciences Direct, Geodesy and Geodynamic, March, pp. 1-8, 2017. DOI: 10.1016/j.geog.2017.03.010 [ Links ]

[5] Magotra, N., Ahmed, N. and Chael, E., Seismic event detection and source location using single station (three components) data. Bull. Seism. Asoc. Am., June, 77(3), pp. 958-971, 1987. [ Links ]

[6] Roberts, R.G., Christoffersson, A. and Cassidy, F., Real-time detection, phase identification and source location estimation using single station three component seismic data. Geophysical Journal, June, 97(3), pp. 471-480, 1989. DOI: 10.1111/j.1365-246X.1989.tb00517.x [ Links ]

[7] Saita, J. and Nakamura, Y., The early warning systems for mitigation of disasters caused by earthquakes and tsunamis. In: Zschau, J. and Kuppers, A., eds. Early warning systems for natural disaster reduction. Berlin: Springer-Verlag, pp. 453-460, 2003. DOI: 10.1007/978-3-642-55903-7_58 [ Links ]

[8] Talandier, J., Reymond, D. and Oka, E.A., Use of variable mantle magnitude for the rapid one-station estimation of teleseismic moments. Geophysical Research Letters, August, 14(8), pp. 840-843, 1987. DOI: 10.1029/GL014i008p00840 [ Links ]

[9] Reymond, D., Hyvernaud, O. and Talandier, J., Automatic detection, location and quantification of earthquakes. Pure and Applied Geophysics, March, 135(3), pp. 361-382, 1991. DOI: 10.1007/BF00879470 [ Links ]

[10] Odaka, T. et al., A new method for quickly estimating epicentral distance and magnitude from a single seismic record. Bull. Seism. Soc. Am., February, 93(1), pp. 526-532, 2003. DOI: 10.1785/0120020008 [ Links ]

[11] Horiuchi, S. et al., An automatic processing system for broadcasting earthquake alarms. Bull. Seism. Asoc. Am. , April, 95(2), pp. 708-718, 2005. DOI: 10.1785/0120030133 [ Links ]

[12] Wu, Y.M., Shin, T.C. and Tsai, Y.B., Quick and reliable determination of magnitude for seismic early warning. Bull. Seism. Soc. Am. , October, 88(5), pp. 1254-1259, 1998. [ Links ]

[13] Espinosa-Aranda, J.M. et al., Mexico city seismic alert system. Seismol. Res. Lett., November, 66(6), pp. 42-53, 1995. DOI: 10.1785/gssrl.66.6.42 [ Links ]

[14] Taylor, J.S. and Cristianini, N., Kernel methods for pattern recognition. First ed. Cambridge, United Kingdom: Cambridge University Press, 2004. [ Links ]

[15] Rodríguez, J.A., Ramirez, F. and Escallon, J.P., Geotechnical seismic characterization for the microzonation of Bogotá. Thessaloniki-Greece, 4 International Conference on Earthquake Geotechnical Engineering, 2007. [ Links ]

[16] Bermúdez, M.L. and Rengifo, F., EL ROSAL: La Estación Sismológica del CTBTO en Colombia. Bogota, Primer Simposio Colombiano de Sismología, 2002, 8 P. [ Links ]

[17] Ottemoller, L., Voss, P. and Havskov, J., Seisan earthquake analysis software for windows, solaris, Linux and Macosx, s.l.: s.n., 2016. [ Links ]

[18] Wu, Y.M. and Zhao, L., Magnitude estimation using the first three seconds P-wave amplitude in earthquake early warning. Geophys. Res. Lett., August, 33(16), pp. L16312, 2006. DOI: 10.1029/2006GL026871 [ Links ]

[19] Wu, Y.M. and Kanamori, H., Rapid assessment of damage potential of earthquakes in Taiwan from beginning of P waves. Bull. Seism. Soc. Am. , February, 93(1), pp. 526-532, 2005. DOI: 10.1785/0120040193 [ Links ]

[20] Magotra, N., Ahmed, N. and Chael, E., Single-station seismic event detection and location. IEEE Transactions on Geoscience and Remote Sensing, January, 27(1), pp. 15-23, 1989. DOI: 10.1109/36.20270 [ Links ]

[21] Eibe, F., Hall, M.A. and Witten, I.H., The WEKA workbench. Online appendix for "Data mining: Practical machine learning tools and techniques. Fourth Edition ed. s.l.: Morgan Kaufmann, 2016. [ Links ]

[22] Lockman, A.B. and Allen, R.M., Single-station earthquake characterization for early warning. Bull. Seism. Soc. Am. , December, 95(6), pp. 2029-2039, 2005. DOI: 10.1785/0120040241 [ Links ]

[23] Böse, M., Heaton, T.H. and Hauksson, E., Rapid estimation of earthquake source and ground-motion parameters for earthquake early warning using data from a single three-component broadband or strong-motion sensor. Bulletin of the Seismological Society of America, 102(2), pp. 738-750, 2012. DOI: 10.1785/0120110152. [ Links ]

How to cite: Ochoa, L.H., Niño, L.F. and Vargas, C.A., Fast estimation of earthquake epicenter distance using a single seismological station with machine learning techniques. DYNA, 85(204), pp. 161-168, March, 2018.

Received: October 19, 2017; Revised: November 28, 2017; Accepted: December 15, 2017

This is an open-access article distributed under the terms of the Creative Commons Attribution License

Services on Demand

Journal

Article

Indicators

Related links

Share

DYNA

Print version ISSN 0012-7353

Dyna rev.fac.nac.minas vol.85 no.204 Medellín Jan./Mar. 2018

https://doi.org/10.15446/dyna.v85n204.68408