Automatic determination of the Atterberg limits with machine learning

Rosas, David Antonio; Burgos, Daniel; Branch, John Willian; Corbi, Alberto; Rosas, David Antonio; Burgos, Daniel; Branch, John Willian; Corbi, Alberto

doi:10.15446/dyna.v89n224.102619

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

DYNA

Print version ISSN 0012-7353On-line version ISSN 2346-2183

Dyna rev.fac.nac.minas vol.89 no.224 Medellín Oct./Dec. 2022 Epub Feb 10, 2023

https://doi.org/10.15446/dyna.v89n224.102619

Article

Automatic determination of the Atterberg limits with machine learning

Determinación automática de los límites de Atterberg con machine learning

David Antonio Rosas^a
http://orcid.org/0000-0002-9722-2659

Daniel Burgos^a^b
http://orcid.org/0000-0003-0498-1101

John Willian Branch^b
http://orcid.org/0000-0002-0378-028X

Alberto Corbi^a
http://orcid.org/0000-0002-7282-4557

^{^a} Research Institute for Innovation & Technology in Education (UNIR iTED), Universidad Internacional de La Rioja (UNIR), Logroño, La Rioja, Spain. davidantonio.rosas@unir.net, daniel.burgos@unir.net, alberto.corbi@unir.net

^{^b} Universidad Nacional de Colombia, Sede Medellín, Facultad de Minas, Departamento de Ciencias de la Computación y de la Decisión, Medellín, Colombia. jwbranch@unal.edu.co

Abstract

In this study, we determine the liquid limit (W₁), plasticity index (PI), and plastic limit (W_p) of several natural fine-grained soil samples with the help of machine-learning and statistical methods. This enables us to locate each soil type analysed in the Casagrande plasticity chart with a single measure in pressure-membrane extractors. These machine-learning models showed adjustments in the determination of the liquid limit for design purposes when compared with standardised methods. Similar adjustments were achieved in the determination of the plasticity index, whereas the plastic limit determinations were applicable for control works. Because the best techniques were based in Multiple Linear Regression and Support Vector Machines Regression, they provide explainable plasticity models. In this sense, W_l = (9.94 ± 4.2) + (2.25 ± 0.3) ∙pF_4.2, PI = (-20.47 ± 5.6) + (1.48 ± 0.3) ∙pF_4.2 + (0.21 ± 0.1) ∙F, and W_p = (23.32 ± 3.5) + (0.60 ± 0.2) ∙pF_4.2 - (0.13 ± 0.04) ∙F. So that, we propose an alternative, automatic, multi-sample, and static method to address current issues on Atterberg limits determination with standardised tests.

Keywords: machine learning; Atterberg limits; pressure-membrane extractor; determination; soils

Resumen

En este estudio, determinamos el límite líquido (W₁), el índice de plasticidad (PI) y el límite plástico (W_p) de suelos naturales finos con ayuda de machine-learning y métodos estadísticos. Ello permite localizarlos en la Carta de Plasticidad de Casagrande con una sola medida en extractores de presión-membrana. Los modelos de machine-learning mostraron ajustes en la determinación de W _l apropiados para propósitos de diseño, comparados con métodos estandarizados. Ajustes similares se alcanzaron en la determinación de PI, mientras que las determinaciones de W _p permiten ajustes apropiados para trabajos de control. Debido a que las técnicas más apropiadas se basaron en Regresión Lineal Múltiple y Máquinas de Soporte de Vectores, aportaron modelos de plasticidad explicables. En este sentido, W_l = (9.94 ± 4.2) + (2.25 ± 0.3) ∙pF_4.2, PI = (-20.47 ± 5.6) + (1.48 ± 0.3) ∙pF_4.2 + (0.21 ± 0.1) ∙F, y W_p = (23.32 ± 3.5) + (0.60 ± 0.2) ∙pF_4.2 - (0.13 ± 0.04) ∙F. Por consiguiente, proponemos un método alternativo, automático, estático y multimuestra para enfrentar problemas frecuentes en la determinación de los Límites de Atterberg con ensayos normalizados.

Palabras clave: machine learning; límites de Atterberg; extractor de presión membrana; determinación; suelo

1. INTRODUCCIÓN

We know since the early works of Albert Atterberg (1846-1916) and Arthur Casagrande (1902-1981) that the plasticity of fine-grained soils resembles a fundamental characteristic of them [¹,²]. In this sense, the consistency of such soils varies with increasing moisture content from solid, semi-solid, plastic, and liquid states, coined the arbitrary borders between them as the shrinkage limit, the plastic limit (𝑊𝑝), and the liquid limit (𝑊𝑙), respectively [³]. Moreover, the difference between 𝑊𝑙 and 𝑊𝑝 is named Plasticity Index (𝑃𝐼). Together with its granulometric analysis, the determination of the plasticity of the soil allows it to be classified according to international systems, such as the USCS [⁴] and the AASHTO [⁵,⁶], among other technical applications in soil science. There are various laboratory methods for determining 𝑊𝑙, such as that of the Casagrande device [7] or through penetrometer tests [⁸], as well as a method for the determination of 𝑊𝑝 [⁷]. Nevertheless, it is unknown until date, a static, multisampling, and automatic test able to determine directly 𝑊𝑙, 𝑊𝑝, and 𝑃𝐼, with a single measure. In this sense, the method developed by [⁹] needs two measures, subjecting sieved samples to different suction pressures of −1,500 KPa (𝑝𝐹4.2) and −33 KPa (𝑝𝐹2.5) inside pressure-membrane extractors [¹⁰].

Regarding W_l measurements, made with the percussion-cup test method, [¹¹] showed that they can vary depending on the technician using them, whether they are performed in different laboratories as well as the device used, or the material in which the cup hits. The correct calibration of the device, the cadence of hitting, and the used grooving tool also have an impact on this reproducibility [¹²]. Besides, greater reproducibility of the results was found in tests carried out with penetrometers [⁸]. Regarding the determination of W_p using the rolling test, [¹¹] showed that its reproducibility is even lower than that obtained for W_l, which shows a subjective component in the measurements. Moreover, [¹³] stated that W_p is a measure of soil brittleness. These authors also revealed that this behaviour depends on the continuity of water circulation into the soil rods, and therefore, cavitation occurs. To follow up, PI is related to friction and permeability in fine-grained soils. In this regard, friction in soils decreases when PI increases whereas permeability exhibits contrary behaviour [¹⁴]. Next, based on the instructions of [¹⁵,¹⁶] to classify soils, become tacitly and qualitatively deduced that the ability of a soil to retain water increases with its plasticity through a characteristic called dilatency, performed by means of simple and well-known procedure. Besides, [¹⁷] established that Wl and PI remain strongly and mainly influenced by the ability of clay minerals to interact with liquids. Since the water-holding capacity of the soil seems to increase with its plasticity [¹⁸], we wonder if it is possible to use a pressure-membrane apparatus, which quantifies said capacity [¹⁰], to determine the Atterberg limits. Therefore, in this study, we will probe 23 fine-grained soil samples from the Betic Cordillera (Spain), presenting a range of values of their W_l between 26% and 62%. Subsequently, we will apply conventional statistical techniques and machine learning to the results obtained with the percussion-cup test and the thread-rolling test [¹⁹] to identify models that could determine W_l, W_p, and PI using only one measure with Richards extractors at −1,500 kPa.

2. Materials and methods

The laboratory and data analysis techniques used as well as the geological locations of the soil samples analysed are described below.

2.1. Geological and geographical locations of the samples

The samples were obtained from the towns of Vélez-Málaga, Granada, Jaén, Linares, Baza, and Caravaca de la Cruz, belonging to the Betic Cordillera (Spain). The geotechnical classifications along with other characteristics are collected in the Table 1; each location is signified by the first letter of its initials, as shown in Fig. 1.

Table 1 Experimental results for the samples analysed.

Soil	Wl (%)	Wp (%)	pF25 (%)	pF42 (%)	PI (%)	F (%)	USCS	AASHTO	GI
J1	44.0	20.4	27.6	18.3	23.8	92.1	CL	A-7-A-7-A	20
J2	34.8	18.8	23.3	10.3	15.9	91.0	CL	A-6	9
G1	41.1	25.7	26.4	16.5	15.4	96.3	ML	A-7-A-7	1
G2	36.8	18.4	23.7	12.4	18.3	89.4	CL	A-6	15
G3	36.6	18.6	26.4	12.7	17.9	91.1	CL	A-6	13
G4	42.8	20.9	26.1	15.1	21.9	93.7	CL	A-7-A-7	21
VM	28.4	27.0	23.5	10.9	1.43	41.9	SM	A-2-4	0
L1	30.1	23.9	17.9	7.5	6.20	84.2	ML	A-4	4
C1	52.4	26.1	25.6	15.1	26.2	88.1	CH	A-7-A-7-A	20
C2	37.8	23.2	23.7	12.0	14.6	63.1	SC	A-2-6	0
C3	38.7	25.0	25.7	15.5	13.7	82.6	SM	A-6	3
C4	41.5	17.3	24.1	14.3	24.2	98.0	CL	A-7-A-7-A	11
C5	40.3	24.5	23.9	14.3	15.7	95.5	CL	A-7-A-7-A	8
C6	43.5	22.8	23.5	15.3	20.7	89.5	CL	A-7-A-7-A	15
C7	50.5	25.8	25.9	16.3	24.7	87.5	CH	A-7-A-7-A	14
J3	47.3	24.4	24.7	14.3	22.9	96.2	CL	A-7-A-7-A	20
J4	46.6	23.7	27.0	15.4	22.8	96.9	CL	A-7-A-7-A	26
J5	61.8	25.7	28.9	18.0	36.1	99.3	CH	A-7-A-7-A	42
B1	29.1	19.1	22.4	10.2	10.0	85.5	SC	A-6	2
B2	32.6	18.2	23.3	10.3	14.4	97.4	CL	A-6	14
B3	26.0	20.6	22.7	6.4	5.4	58.7	CL-ML	A-4	0
B4	32.2	18.1	24.2	9.6	14.1	97.6	CL	A-6	14
B5	45.7	19.7	28.7	16.7	25.9	97.6	CL	A-7-A-7-A	27

Source: self-made.

Source: simplified scheme of Sanz de Galdeano et al., 2007.

Figure 1 Geological locations of the samples in the Betic Cordillera.

The Betic Cordillera is located in the south-east of Spain, in a sloping strip that extends from Cádiz to the Balearic Islands, and they constitute the westernmost Alpine Mountain chain in the Mediterranean, beside the Rif [²⁰].

2.2. Laboratory techniques

In the next sections, we describe W₁, W_p, and PI determination. Then we show relevant pressure-membrane apparatus fundamentals used for soil water retention measurement, and such procedure.

2.2.1. Atterberg limits measurement

All the soils analysed were extracted via undisturbed sampling using a push-in Shelby tube sampler with a 101.6 mm inner diameter and a 1.63 mm wall thickness. They were delivered undisturbed to a testing laboratory into PVC pipes, in which we classified the soils, performed granulometry, and determined the liquid limit and plastic limit of the samples.

With a part of the sample, granulometry was performed through sieving following the NLT 104/91 standard [⁴].

The second part of the sample was sieved with an N40 ASTM sieve to determine the Atterberg limits, and the sieved fraction was separated into two portions by quartering. With the first half, we applied the Casagrande device method, which allowed us to determine W₁ following the ASTM D 4318-84 standard [¹⁹].

Next, the moisture of the precise part of the sample that slid into the groove was measured by oven-drying at 105°C, expressed as weight percentage (H), by weight difference before (P_w) and after drying (P_s), following eq. (1).

On the other hand, the plastic limit (W_p) is determined by ASTM D 4318-84 standard [¹⁹].

In addition, PI is defined by [¹⁹] as the difference between W₁ and W_p (eq. 2).

Once determined W₁ and PI of each sample, the Casagrande plasticity chart allows to classify the soils according to its plasticity (see Fig. 2). Consequently, with granulometry, W₁ and PI, the soils were classified using the USCS [⁵] and AASHTO [⁶] systems.

Source: self-made, following the D- 2487-06 ASTM standard.

Figure 2 Locations of the samples in the plasticity chart.

2.2.2. Soil water holding capacity

To clarify some aspects about the pressure-membrane extractors used in this study, the matric potential (F) represents the tension that soils exert in sorption phenomena. In this regard, [²¹] defined F as the free energy change in a unit volume of water when isothermally transferred from the soil water state to the free water state. Moreover, [²²] reported that the decimal logarithm in absolute value of F, expressed as the height of a water column in centimetres, is the usual measure of the tension applied to the soils, and it is referred to as pF (eq. 3).

In this paper, we assume the generalized Soil Water Retention Curve (SWRC) equation described by [²¹]. In this model, water is covering the mineral surfaces because of water adsorption phenomena and capillary effects in pores with different diameters. This is represented in eq. (4):

where total water retained (θ) equals adsorbed water (θ _a ) plus capillary water (θ _c ) under ψ prevailing suction.

The adsorbed water mainly depends on the specific surface area of the minerals of the soil as well as the cation exchange capacity (CEC) of its clays, colloids, and organic matter [²³,²⁴]. This moisture, according to the BET theory [²⁵] could be represented as a multilayer of sorbito, covering the solid phase of a soil, where Van der Waals forces, electrically charged particles, osmotic and hydration components are involved [²⁶].

Furthermore, [²⁷] computed for adsorbed water θ_a (ψ) with eq. (5):

where ψ is the suction pressure, θ_{a max} is the adsorption capacity, ψ_max is the higher matrix suction, and m is the adsorption strength. In both the BET theory and Lu’s model [²⁷], adsorption water presents a tightly adsorbed component closer to less intensely bonded soil particles and a film of adsorbed water.

Moreover, [²⁷] also determined in eq. (6) the capillary water retention θ_c (ψ), given by the following equation:

It depends on porosity or saturated water content (θ_s), air entry suction (α ^-1), pore size distribution (n), and mean cavitation suction (ψ_c).

Moreover, water sorption-desorption curves are affected by hysteresis, yielding that θ(ψ) is greater when the soils absorb moisture than when drying [²⁸]. Thus, the samples were tested before saturation.

To sum up, when we subject a saturated sample to suction pressures of −1,500 KPa, a fraction of water is absorbed through the porous disk of the Richards apparatus used. So that it will contain adsorption moisture and water in pores up to 0.2 μm in diameter [²⁹]. On the other hand, saturated samples subjected to −33 KPa, contain the moisture retained at −1,500 KPa plus the slow-flow gravitational water, which remains in pores between 0.2 and 8 μm in diameter [²⁹]. Likewise, [³⁰] relate shear strength with SWRC for unsaturated soils. Moreover, as W _i is determined with shear strength-based tests and W _p is related to cavitation [¹³], we measure soil water holding capacity next.

2.2.3. Soil water holding capacity measurement

Soil water holding capacity is measured with a Richards extractor. This is a device made up of hermetic circular chambers containing 300 mm porous porcelain plates with membranes. It could probe the samples by applying different suction pressures using a compressor, pipes, and gauges [¹⁰].

With more detail, the porcelain disks were saturated in deionized water for 24 hours. Then two quartered batches were taken from the dry soil samples, previously sieved with an ASTM N40 sieve. Later, those samples were placed carefully in rubber rings on the porcelain disks. Next, deionized water was poured onto the porcelain disks and they were left to rest in expanded polystyrene chambers for 2 days so that they could absorb the moisture. Subsequently, both batches of soil samples were subjected to the required suction pressure for 2 days into independent chambers of the Richards extractor. The water that the samples did not retain was evacuated through these porous disks and membranes. Finally, the moisture retained by the samples after the process was determined by weight difference by drying in an oven at 105°C, according to eq. (1). For this, weights substances with a lid, spatulas, and washing bottles were used.

In this work, the prevailing suctions used (ψ) were −1,500 kPa (pF = 4.2) and −33 kPa (pF = 2.5), respectively. As a consequence, the moisture content of a soil expressed as a percentage by weight at probed pressure will be denoted as pF _4.2 and pF _2.5 in tables and machine-learning model equations.

3. Calculation

For data analysis, SPSS 25 [³¹] and a Multiple Classifier System (MCS) are used together [³², ³³]. MCSs combines regression algorithms from heterogenous theoretical backgrounds programmed with Python 3.0 [³⁴]. So that we used the Pandas [³⁵], Seaborn [³⁶], Matplotlib [³⁷], Scikit-learn [³⁸], and Statsmodels [³⁹] libraries. The commented source code is suitable for the Jupyter Notebook environment [⁴⁰]. The starting data in CSV format and a code example can be consulted in the GitHub repository that is cited in the appendix A. Those techniques were Multiple Linear Regression (MLR), Decision Tree Regression (DTR), Random Forest Regression (RFR), and Support Vector Machines Regression (SVMR).

First, MLR is coded since it provides intelligible models. In this regard, [⁴¹] implemented MLR models according to eq. (7):

which represents a linear model of coefficients w = (w₀, w₁, … w_n) and characteristics X = (x₁, x₂, … x_n), where is the variable predicted by the model.

To do this, scikit-learn [⁴¹] minimises the residual of the sum of squares between the observed objective characteristics and those predicted by the linear model, solving the following minimizing function (8):

On the other hand, the Statsmodels library [³⁹] implements the ordinary least squares technique capable to fit the data model, providing additional statistical parameters used in our MCS. Furthermore, the parameters of the models and their statistical adjustments were evaluated, bearing in mind a level of statistical significance α = 0.05.

Besides, DTR was implemented with scikitlearn. Briefly, DTR algorithms generate recursive partitions of the feature space, following rules that maximize the differentiation of the splits. Such splits set the nodes of a tree structure. So that, a DTR model learns local linear regressions on the basis of such nodes [⁴¹]. The depth of the tree was set by default in the code.

Moreover, using scikit-learn, the RFR technique was implemented. So, 100 of the previously said decision trees were selected, taking different subsets of the whole original dataset, and then, two estimators of the predictive accuracy were obtained by averaging. This time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were computed to control over-adjustment, yielding better predictive capability than a DTR [⁴²]. On the contrary, DTRs may provide intelligible models [⁴¹].

Next, SVMR capabilities were provided to the code using scikit-learn. So that, bearing in mind eq. (7), where w₀ = b, the aim is to find a function as flat as possible with the most ε deviation from the training data [⁴³], computed as in eq. (9). In this sense, C is a regularization parameter and ϕ is a linear kernel function.

To sum up, following [⁴⁴] we perform MLR with the whole dataset to establish explainable empirical relationships for W _l , W _p , and PI to choose the characteristics of the models. Once selected the characteristics of each model, the beforementioned four regression techniques are used to predict W _l , W _p , and PI. Then, using scikit-learn and following [⁴⁵, ⁴⁶], a K-folded double cross-validation score with K = 10 is computed in terms of RMSE (eq. 10) and MAE (eq. 11), to estimate the accuracy of the candidate models. In this sense, K-folded cross-validation is implied to split n observations into K equal subsets so that the (k - 1)/k fraction of the observations is used for model construction, while the 1/k portion of the data is used for validation. In eq. (10, 11), n is the number of observations, the predicted values of the model, and y _n the observed values.

4. Results

Before continuing, we must show that the analyzed samples had a higher concentration of materials that passed through an N200 ASTM sieve than the starting soils because to determine the Atterberg limits, the samples were sieved with an N40 ASTM sieve. That is why we must define the F characteristic. This represents fine material with a diameter of less than 0.074 mm that contains a sample, following eq. (12):

Later, Table 1 summarises the experimental results obtained. The quantities are expressed as percentages by weight. W₁ is the liquid limit. W_p is the plastic limit. The water retained at −33 KPa is pF25. The water retained at −1,500 KPa is pF42. PI is the plasticity index. F is the calculated portion of soil particles in a sample sieved with an N40 ASTM sieve that passes through an N200 ASTM sieve. USCS is the classification of the soil samples in the USCS system. AASHTO is the classification of the soil samples in the AASHTO system with the group index (GI).

Next, in Fig. 2, the analysed samples are presented, following the USCS system. Of the 23 soils, three are high-plasticity clays (CH), two are low-plasticity silts (ML), one is a low-plasticity clayey silt (CL-ML), and the remaining 17 are low-plasticity clays (CL).

After, non-parametric linear correlation tests were performed among the study variables (model characteristics) with SPSS 25. Said correlation coefficients were taken into account to preselect the characteristics to be used in the construction of the models, with certain precautions. Table 2 summarises the results obtained after applying Spearman’s ρ test.

Table 2. Spearman’s ρ correlation coefficients and level of bilateral significance.

N = 23 in All Cases		Wl	Wp	PI	pF2.5	pF4.2	F
Wl	Correl.	1.000	0.375	0.915^**	0.732^**	0.838^**	0.439^*
Wl	Next (bilat.)	.	0.078	<0.001	<0.001	<0.001	0.036
Wp	Correl.	0.375	1.000	0.077	0.222	0.399	−0.348
Wp	Next (bilat.)	0.078	.	0.727	0.308	0.059	0.104
PI	Correl.	0.915^**	0.077	1.000	0.684^**	0.713^**	0.540^**
PI	Next (bilat.)	<0.001	0.727	.	<0.001	<0.001	0.008
pF2.5	Correl.	0.732^**	0.222	0.684^**	1.000	0.851^**	0.534^**
pF2.5	Next (bilat.)	<0.001	0.308	<0.001	.	<0.001	0.009
pF4.2	Correl.	0.838^**	0.399	0.713^**	0.851^**	1.000	0.364
pF4.2	Next (bilat.)	<0.001	0.059	<0.001	<0.001	.	0.088
F	Correl.	0.439^*	−0.348	0.540^**	0.534^**	0.364	1.000
F	Next (bilat.)	0.036	0.104	0.008	0.009	0.088	.

Source: self-made.

5. Discussion

First, we will use the information in Table 2 to select the appropriate characteristics to build the models, avoiding collinearities between them. Such is the case between pF_2.5 and pF_4.2, (ρ = 0.851; p. < 0.001), so they will not be included in the same model. However, F shows collinearity with pF_2.5 (ρ = 0.534; p. = 0.009), although it does not seem to present it with respect to pF_4.2 (ρ = 0.534; p. = 0.088).

Consequently, pF_2.5 and pF_4.2 can be selected to build single-feature models, and F and pF_4.2 can be selected for two-feature models. To avoid over-adjustments, models that do not use powers of any of their characteristics have been chosen, especially when the range of values of the liquid limit of the samples is limited (from 26 to 62). Further, the adjustments to a linear model seem adequate.

5.1. Determination of the liquid limit

Table 2 shows that the linear correlations between the liquid limit (W₁) and the variables pF_2.5 (ρ = 0.732; p. < 0.001), pF_4.2 (ρ = 0.838; p. < 0.001) and F (ρ = 0.439; p. = 0.036) are significant, with a confidence level greater than 95% in all cases. Consequently, characteristics F, pF_2.5, and pF_4.2 were selected to build linear models with a single characteristic, whereas F and pF4.2 were used to build a model with two characteristics, which are reflected in table 3.

Considering the data collected in Table 5, the model with a unique characteristic pF_4.2 has been selected since it is the only one in which all of its coefficients are statistically significant, with a confidence level of at least 95%. It also presents a considerable adjustment expressed as R² (0.722) and is statistically significant, with a confidence level greater than 99%. The fit between the measured Wl values and those calculated through eq. (13) are shown in Fig. 3.

Table 3 Linear models studied to determine Wl.

Characteristics of the Model	Coefficients	Std. Err.	P > \|t\|	R² and Next Change in F
F	W₀ = 13.31 W₁ = 0.31	10.0 0.1	0.199 <0.013	0.258 (p. = 0.013)
pF4.2	W₀ = 9.94 W₁ = 2.25	4.2 0.3	0.028 0.001	0.722 (p. < 0.001)
pF2.5	W₀ = −23.9 W₁ = 2.58	13.8 0.6	0.099 <0.001	0.505 (p. < 0.001)
pF4.2 F	W₀ = 4.85 W₁ = 2.07 W₂ = 0.08	6.3 0.3 0.1	0.448 <0.001 0.290	0.737 (p. < 0.001)

Source: self-made.

Figure 3 Fit of the selected model to determine the liquid limit (Wl). Measures expressed as percentages by weight.

Table 4 Different liquid limit models scores with the pF4.2 characteristic and double Cross-validation (K=10) in terms of mean RMSE with mean standard deviation.

Regression model	Mean RMSE	Mean RMSE Standard deviation
MLR	4.15	2.73
DTR	7.06	2.98
RFR	6.03	2.88
SVMR	3.65	2.92

Source: self-made.

Table 5 Proposed model to determine PI.

Characteristics of the Model	Coefficients	Std. Err.	P > \|t\|	R² and Next Change in F
F	W₀ = −14.45 W₁ = 0.37	7.9 0.1	0.080 <0.001	0.451 (p. < 0.001)
pF4.2	W₀ = −7.72 W₁ = 1.92	4.4 0.3	0.096 <0.001	0.628 (p. < 0.001)
pF2.5	W₀ = −41.99 W₁ = 2.42	12.3 0.5	0.003 <0.001	0.531 (p. < 0.001)
pF4.2 F	W₀ = −20.47 W₁ = 1.48 W₂ = 0.21	5.6 0.3 0.1	0.002 <0.001 0.007	0.745 (p. < 0.001)

Source: self-made.

In Table 4, we can see the scores of MLR, DTR, RFR and SVMR models calculated as the mean RMSE of a cross-validation procedure with 10 K-folds.

The SVMR model exhibits the lower mean RMSE (4.15) followed by MLR model (4.15), but the MLR model has an inferior standard deviation (2.73) vs (2.92). DRT and RFR demonstrate the worst scores. On the other hand, [¹¹] established the tolerance limits in the reproducibility required for the determination of the liquid limit in ±5% for control processes and in ±10% for design work. Accordingly, MLR and SVMR models would be acceptable for design purposes.

5.2. Determination of the plasticity index

Table 2 shows that the linear correlations between the PI and the variables pF_2.5 (ρ = 0.684; p. < 0.001), pF_4.2 (ρ = 0.713; p. < 0.001), and F (ρ = 0.540; p. = 0.008) are significant, with a confidence level greater than 99% in all cases. Consequently, the characteristics F, pF_2.5, and pF_4.2 were selected to build linear models of a single characteristic, and F and pF_4.2 were selected for a model of two characteristics, which are reflected in Table 5.

In view of the data collected in Table 5, the model with the characteristics pF_4.2 and F was selected since all of its coefficients are statistically significant, with a confidence level higher than 99%, and present the highest R² (0.745) that is statistically significant, with a confidence level greater than 99%. The fit between the measured PI values and those calculated through eq. (14) is shown in Fig. 4.

Source: self-made.

Figure 4 Fit of the selected model to determine the Plasticity Index (PI).

Next, in Table 6, we show the scores of MLR, DTR, RFR and SVMR models calculated as the mean RMSE of a cross-validation procedure with 10 K-folds.

Table 6 Different plasticity index models scores with pF_4.2 and F characteristics and double Cross-validation (K=10) in terms of mean RMSE with mean standard deviation.

Regression model	Mean RMSE	Mean RMSE Standard deviation
MLR	3.81	2.36
DTR	5.93	3.41
RFR	4.41	2.26
SVMR	4.04	2.31

Source: self-made.

Table 7 Proposed model to determine Wp.

Characteristics of the Model

Coefficients

Std. Err.

P > |t|

R² and Next Change in F

pF4.2

W₀ = 25.32

W₁ = 0.60

W₂ = −0.13

3.5

0.2

0.04

<0.001

0.006

0.009

0.380

(p. = 0.008)

Source: self-made.

This time, the MLR model exhibits the lower mean RMSE (3.81) followed by SVMR model (4.04), but the SVMR model has an inferior standard deviation (2.31) vs (2.36). DTR and RFR have the worst scores and they are not considered, because DTR models showed over-adjustment without cross-validation (RMSE = 0). Although [¹¹] did not establish an acceptable tolerance margin for the determination of PI, and this measure carries, by definition, the errors in the determination of W₁ and W_p, the tolerance margins that this researcher considered acceptable for the measurement of W₁ will be selected as reference, as the most restrictive. Accordingly, MLR and SVMR models would be acceptable for design purposes (±10%).

5.3. Determination of the plastic limit

In order to determine W_p we have selected pF_4.2 and F as the characteristics of the model, as we show in Table 7.

In this regard, the model is defined by eq. (15), to the detriment of indirect measurement, since it carries the errors of the determinations of W₁ and PI. Further, W₁ and W_p are determined with very different standards and methods. In this way, using the whole dataset, the following experimental relation is achieved (eq. 15):

Next, the fit between the measured W_p values and those calculated by means of eq. (15) are presented in Fig. 5.

Source: self-made.

Figure 5 Fit of the selected model to determine the plastic limit (W_p).

Subsequently, in Table 8, we collect the scores of MLR, DTR, RFR and SVMR models with pF_4.2 and F characteristics, calculated as the mean RMSE of a cross-validation procedure with 10 K-folds.

Table 8 Different plasticity index models scores with pF_4.2 and F characteristics and double cross-validation (K=10) in terms of mean RMSE with mean standard deviation.

Regression model	Mean RMSE	Mean RMSE Standard deviation
MLR	2.47	0.91
DTR	3.57	1.34
RFR	2.64	1.46
SVMR	2.32	1.30

Source: self-made.

Now, the SVMR model exhibits the lower mean RMSE (2.32) followed by MLR model (2.47), but the MLR technique yields an inferior standard deviation (2.31) vs (2.36). DRT and RFR show higher scores and exhibit over-adjustments without cross-validation (RMSE = 0), so they are rejected. Moreover [¹¹] found a tolerance margin of reproducibility for the rolling test method of ±10% for control purposes. Therefore, using pressure-membrane extractors offers an acceptable uncertainty, especially when all of them are below a tolerance margin of ±5, so MLR and SVMR models appear very precise. Furthermore, [⁴⁷] found an uncertainty of ±20% in the determination of Wp, using penetrometers. Consequently, MLR and SVMR models would be acceptable for control purposes.

6. Conclusions

The aim of this study was to develop an alternative method suitable for the determination of Atterberg limits using a pressure-membrane apparatus and MCSs.

In addition, tree models have been described using MLR and SVMR techniques that would allow the determination of W₁, W_p, and PI in the analysed soils. In this regard, the selected characteristics were pF_4.2 and F. The tolerance margins shown for W₁ and PI seem appropriate for design purposes. On the other hand, in the determination of W_p, we have found appropriate tolerance margins for control work.

Likewise, in view of the experimental results, machine-learning techniques simplify the method proposed by [⁹], offering models that conceptually make sense with respect to the Atterberg limits. Thus, W_p, and PI increase proportionally as the capacity to retain water strongly bound to the soil particles and in pores with a diameter less than 0.2 μm increases. In contrast, the amount of fine material with a maximum diameter of less than 0.074 mm also affects these plasticity indices but to a lesser extent. However, estimating the liquid limit only requires measuring the capacity of the samples to retain water in fine pores smaller than 0.2 μm and form films around their mineral grains (pF_4.2), and we have found that the first increases with the second.

Furthermore, while the precision of the method could be improved in later studies, using it could entail certain advantages. For instance, it allows several static tests of different samples to be carried out at the same time, eliminating subjectivity in the determinations and increasing the productivity of the laboratories. Likewise, it could prevent the need to carry out such frequent trial-error tests in standardized methods if it is used as an indicative test of the moisture required for the determination of W₁, W_p, and PI. Besides, cheaper methods may be developed.

References

[1] Blackall, T.E., A.M. Atterberg 1846-1916. Geotechnique, 3(1), pp.17-19,1952. DOI: https://doi.org/10.1680/geot.1952.3.1.17 [ Links ]

[2] Galindo, R., Lara, A. and Guillán, G., Contribution to the knowledge of early geotechnics during the twentieth century: Arthur Casagrande. History of Geo- and Space Sciences, 9(2), pp.107-123, 2018. DOI: https://doi.org/10.5194/hgss-9-107-2018 [ Links ]

[3] Casagrande, A., Classification and identification of soils. In: Proceedings of the American Society of Civil Engineers, [online]. 1947, pp. 901-991. [date of reference May 11th of 2022]. Available at: Available at: https://cedb.asce.org/CEDBsearch/record.jsp?dockey=0371060 [ Links ]

[4] Normas NLT I: ensayos de carreteras. Granulometría de suelos por tamizado (NLT 104/91). Madrid: Centro de Estudios y Experimentación de Obras Públicas. 1992. [ Links ]

[5] American Society for Testing and Materials. Standard practice for classification of soils for engineering purposes: unified soil classification system (ASTM D 2487-06). ASTM International, West Conshohocken, PA, USA,. 2006. DOI: https://doi.org/10.1520/D2487-06 [ Links ]

[6] American Society for Testing and Materials. Standard practice for classification of soils and soil-aggregate mixtures for highway construction purposes (ASTM D 3282-93). ASTM International, West Conshohocken, PA, USA, 2004. DOI: https://doi.org/1010.1520/D3282-93R04E01 [ Links ]

[7] Normas NLT I: ensayos de carreteras. Determinación del límite líquido de un suelo por el método de Casagrande (NLT 105/91). Centro de Estudios y Experimentación de Obras Públicas, Madrid, España, 1992. [ Links ]

[8] Wires, K.C., The Casagrande method versus the drop-cone penetrometer method for the determination of liquid limit. Canadian Journal of Soil Science, 64(2), pp. 297-300,1984. DOI: https://doi.org/10.4141/cjss84-031 [ Links ]

[9] Rosas, D.A., Procedimiento para la determinación del límite líquido, limite plástico e índice de plasticidad mediante extractor de presión membrana o ‘aparato de Richards’. Patent ES2301297A1. [online]. 2005 [date of reference May 11th of 2022]. Available at: Available at: https://patents.google.com/patent/ES2301297A1/es?oq=david+antonio+rosas+espin [ Links ]

[10] Richards, L.A., A pressure-membrane extraction apparatus for soil solution. Soil Science, 51(5), pp. 377-386, 1941. DOI: https://doi.org/10.1097/00010694-194105000-00005 [ Links ]

[11] Sherwood, P.T., The reproducibility of the results of soil classification and compaction tests. RRL Reports, Road Research Lab/UK/. [online]. 1970. [date of reference May 11th of 2022]. Available at: Available at: https://trid.trb.org/view/121268 [ Links ]

[12] Torres, A. and Tadeo, A.I., Análisis de la Norma de Ensayo NLT 105/91, ‘Determinación del límite líquido de un suelo por el método del aparato Casagrande’. Revista Digital del Cedex (117), pp. 93-93, [online]. 2000. [date of reference May 11th of 2022]. Available at: Available at: http://ingenieriacivil.cedex.es/index.php/ingenieria-civil/article/view/1535 [ Links ]

[13] Haigh, S.K., Vardanega, P.J. and Bolton, M.D., The plastic limit of clays. Géotechnique, 63(6), pp. 435-440, 2013. DOI: https://doi.org/10.1680/geot.11.P.123 [ Links ]

[14] Sorensen, K.K. and Okkels, N., Correlation between drained shear strength and plasticity index of undisturbed overconsolidated clays. In: Proceedings of the 18th International Conference on Soil Mechanics and Geotechnical Engineering, [online]. 1, pp. 423-428, 2013. [date of reference May 11th of 2022]. Available at: Available at: https://www.geo.dk/media/1209/correlation-between-drained-shear-strength-and-plasticity-index-of-undisturbed-overconsolidated-clays_final-after-1-review_b_sorensen-okkels.pdf [ Links ]

[15] Soil Survey Staff., Soil survey field and laboratory methods manual. Soil Survey Investigations Report No. 51, Version 2.0. R. Burt and Soil Survey Staff (ed.). U.S. Department of Agriculture, Natural Resources Conservation Service. [online]. 2014.[date of reference May 11th of 2022]. Available at: Available at: https://www.nrcs.usda.gov/Internet/FSE_DOCUMENTS/stelprdb1244466.pdf [ Links ]

[16] Shinseki, E.K., and Hudson, J.B., The unified soil classification system. Material Testing Field Manual No. 5-742. Appendix B. Headquarters of the Department of the Army. [online]. Washington, D.C., USA, 2001. Available at: http://www.globalsecurity.org/military/library/policy/army/fm/5-472/fm5-472reprint.pdf [ Links ]

[17] Schmitz, R.M., Schroeder, C. and Charlier, R., Chemo-mechanical interactions in clay: a correlation between clay mineralogy and Atterberg limits. Applied Clay Science, 26(1-4), pp. 351-358, 2014. DOI: https://doi.org/10.1016/j.clay.2003.12.015 [ Links ]

[18] Lambe, T.W. and Whitman, R.V., Soil mechanics. John Wiley & Sons, New York, USA, 1969. [ Links ]

[19] American Society for Testing and Materials. Standard test method for liquid limit, plastic limit, and plasticity index of soils (D4318-17e1). ASTM International. West Conshohocken, PA, USA, 2018. doi: 10.1520/D4318-17E01 [ Links ]

[20] Sanz, C., Galindo, J., Alfaro, P., and Ruano, P. El relieve de la Cordillera Bética. Enseñanza de las Ciencias de la Tierra, 15(2), pp.185-195, 2007. [date of reference May 11th of 2022]. Available at: Available at: https://www.raco.cat/index.php/ECT/article/download/120969/166484 [ Links ]

[21] Zhang, C. and Lu, N., Unitary definition of matric suction. Journal of Geotechnical and Geoenvironmental Engineering, 145(2), art. 02818004, [online]. 2019. [date of reference May 11th of 2022]. Available at: Available at: https://doi.org/10.1061/(ASCE)GT.1943-5606.0002004 [ Links ]

[22] Cianfrani, C., Buri, A., Vittoz, P., Grand, S., Zingg, B., Verrecchia, E. and Guisan, A., Spatial modelling of soil water holding capacity improves models of plant distributions in mountain landscapes. Plant and Soil, 438(1-2), pp.57-70, 2019. DOI: https://doi.org/10.1007/s11104-019-04016-x [ Links ]

[23] Torrent, J., Campillo, M.C. and Barrón, V., Predicting cation exchange capacity from hygroscopic moisture in agricultural soils of Western Europe. Spanish Journal of Agricultural Research, 13(4), art. 8212, 2015. DOI: http://dx.doi.org/10.5424/sjar/2015134-8212 [ Links ]

[24] Wuddivira, M.N., Robinson, D.A., Lebron, I., Bréchet, L., Atwell, M., De Caires, S. and Tuller, M., Estimation of soil clay content from hygroscopic water content measurements. Soil Science Society of America Journal, 76(5), pp. 1529-1535, 2012. Doi: https://doi.org/10.2136/sssaj2012.0034 [ Links ]

[25] Brunauer, S., Emmett, P.H. and Teller, E., Adsorption of gases in multimolecular layers. Journal of the American Chemical Society, 60(2), pp. 309-319, 1938. DOI: https://doi.org/10.1021/ja01269a023 [ Links ]

[26] Lu, N. and Zhang, C., Soil sorptive potential: concept, theory, and verification. Journal of Geotechnical and Geoenvironmental Engineering , [online]. 145(4), art. 04019006, 2019. [date of reference May 11th of 2022]. Available at: Available at: https://doi.org/10.1061/(ASCE)GT.1943-5606.0002025 [ Links ]

[27] Lu, N., Generalized soil water retention equation for adsorption and capillarity. Journal of Geotechnical and Geoenvironmental Engineering , [online]. 142(10), art. 04016051, 2016. [date of reference May 11th of 2022]. Available at: Available at: https://doi.org/10.1061/(ASCE)GT.1943-5606.0001524 [ Links ]

[28] Pham, H.Q., Fredlund, D.G. and Barbour, S.L., A study of hysteresis models for soil-water characteristic curves. Canadian Geotechnical Journal, 42(6), pp. 1548-1568, 2005. DOI: https://doi.org/10.1139/t05-071 [ Links ]

[29] Weil, R.R. and Brady, N.C., The nature and properties of soils: Global edition. England: Pearson, 2016. [ Links ]

[30] Kim, W.S. and Borden, R.H., Influence of soil type and stress state on predicting shear strength of unsaturated soils using the soil-water characteristic curve. Canadian Geotechnical Journal , 48(12), pp. 1886-1900, 2011. DOI: https://doi.org/10.1139/t11-082 [ Links ]

[31] IBM corp. SPSS V.25 documentation. [date of reference May 11th of 2022]. Available at: Available at: https://www.ibm.com/mysupport/s/topic/0TO500000001yjtGAA/spss-statistics?language=en_US [ Links ]

[32] Chow, C.K., Statistical independence and threshold functions. IEEE Transactions on Electronic Computers, EC-14(1), pp.66-68, 1965. DOI: https://doi.org/10.1109/pgec.1965.264059 [ Links ]

[33] Woźniak, M., Graña, M. and Corchado, E., A survey of multiple classifier systems as hybrid systems. Information Fusion, 16, pp.3-17, 2014. DOI: https://doi.org/10.1016/j.inffus.2013.04.006 [ Links ]

[34] Van Rossum, G. and Drake, F.L., Python 3 Reference Manual. CreateSpace, Scotts Valley, CA, USA, 2021. [date of reference May 11th of 2022]. Available at: Available at: https://docs.python.org/3/reference/ [ Links ]

[35] McKinney, W., Data structures for statistical computing in Python, In: Proceedings of the 9th Python in Science Conference, vol. 445, 2011, pp. 51-56. DOI: https://doi.org/10.25080/majora-92bf1922-00a [ Links ]

[36] Waskom, M., Botvinnik, O., O’Kane, D., Hobson, P., Lukauskas, S., Gemperline, D. and Qalieh, A., Mwaskom/Seaborn: v0.8.1. Zenodo. September, 2017. [date of reference May 11th of 2022] DOI: [date of reference May 11th of 2022] DOI: https://doi.org/10.5281/zenodo.883859 [ Links ]

[37] Hunter, J.D., Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), pp. 90-95, 2017. DOI: https://doi.org/10.1109/mcse.2007.55 [ Links ]

[38] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O. and Vanderplas, J., Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12, pp. 2825-2830, 2011. DOI: https://dl.acm.org/doi/10.5555/1953048.2078195 [ Links ]

[39] Seabold, S. and Perktold, J., Stats-models: econometric and statistical modeling with python, in: Proceedings of the 9th Python in Science Conference , vol. 57, 61 P., 2010. DOI: https://doi.org/10.25080/majora-92bf1922-011 [ Links ]

[40] Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J. and Ivanov, P., Jupyter Notebooks: a publishing format for reproducible computational workflows IOS Press Ebooks, 2016, pp. 87-90. [date of reference May 11th of 2022]. DOI: DOI: https://doi.org/10.3233/978-1-61499-649-1-87 [ Links ]

[41] Scikit-learn Developers. Linear models (v. 0.23.2). [online]. 2020. [date of reference May 11th of 2022]. Available at: Available at: https://scikit-learn.org/stable/modules/linear_model.html [ Links ]

[42] Breiman, L., Random forests. Machine learning, 45(1), pp. 5-32, 2011. DOI: https://doi.org/10.1023/A:1010933404324 [ Links ]

[43] Smola, A.J. and Schölkopf, B., A tutorial on support vector regression. Statistics and Computing, 14(3), pp. 199-222, 2014. DOI: https://doi.org/10.1023/B:STCO.0000035301.49549.88 [ Links ]

[44] Kozak, A. and Kozak, R., Does cross validation provide additional information in the evaluation of regression models?, Canadian Journal of Forest Research, 33(6), pp. 976-987, 2003. DOI: https://doi.org/10.1139/x03-022 [ Links ]

[45] Stone, M., Cross‐validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), pp. 111-133, 1974. DOI: https://doi.org/10.1111/j.2517-6161.1974.tb00994.x [ Links ]

[46] Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, 14(2), pp. 1137-1145, 1995. DOI: https://dl.acm.org/doi/10.5555/1643031.1643047 [ Links ]

[47] Shimobe, S. and Spagnoli, G., A global database considering Atterberg limits with the Casagrande and fall-cone tests. Engineering Geology, 260, art. 105201, 2019. DOI: https://doi.org/10.1016/j.enggeo.2019.105201 [ Links ]

How to cite: Rosas, D.A., Burgos, D., Branch, J.W. and Corbi, A., Automatic determination of the atterberg limits with machine learning. DYNA, 89(224), pp. 34-42, October - December, 2022.

D.A. Rosas, is a PhD. candidate and researcher in Computer Science at UNIR iTED (Universidad Internacional de la Rioja-Spain), where he received an Excellence Grant. He also was CEO of GTD, a Spanish R&D company devoted to Engineering Geology and Environmental Projects. ORCID: 0000-0002-9722-2659

D. Burgos, is the Vice Chancellor of International Projects at the Universidad Internacional de la Rioja (Spain). He is also professor at the Departamento de Ciencias de la Computación y de la Decisión, Facultad de Minas, Universidad Nacional de Colombia, Sede Medellín, Colombia. Besides, he is Director of the Research Institute UNIR iTED, and director of the UNESCO chair in eLearning. He received several Ph.D. in different fields, such as Computer Science, Education, Communication, Management and Anthropology. ORCID: 0000-0003-0498-1101

J.W. Branch-Bedoya, received a PhD. in Computer Science. He is professor at the Departamento de Ciencias de la Computación y de la Decisión, Facultad de Minas, Universidad Nacional de Colombia, Sede Medellín, Colombia. He is also the leader of the Grupo de Investigación y Desarrollo en Inteligencia Artificial (GIDIA). ORCID: 0000-0002-0378-028X

A. Corbi, received a PhD. in Corpuscular Physics and a Diploma in Advanced Studies in Physical Oceanography. He is a professor and researcher at UNIR iTED (ESIT-Universidad Internacional de la Rioja-Spain). ORCID: 0000-0002-7282-4557

Appendice

A. Python code example for Jupyter Notebook and dataset in CSV format, available at this GitHub repository: https://github.com/davidantoniorosas/Dyna

Received: May 12, 2022; Revised: September 09, 2022; Accepted: September 15, 2022

This is an open-access article distributed under the terms of the Creative Commons Attribution License

Services on Demand

Journal

Article

Indicators

Related links

Share

DYNA

Print version ISSN 0012-7353On-line version ISSN 2346-2183

Dyna rev.fac.nac.minas vol.89 no.224 Medellín Oct./Dec. 2022 Epub Feb 10, 2023

https://doi.org/10.15446/dyna.v89n224.102619