Introduction
Pipes are used worldwide for the transportation of liquids with different properties. Non-Newtonian fluids are transported in pipelines in the mining and metallurgical industries, such as drilling mud, cementitious composites, and pastes 1. In contrast, Newtonian fluids have a wider field of use, especially in turbulent flow over rough surfaces, with several engineering applications such as industrial plants, internal distribution networks in buildings, hydraulic turbines, irrigation systems, and drinking water pipelines 2, as well as in open-channel hydraulics 3. Head losses are common in pipes or open channels, an essential parameter that affects the design and operation of the circulation flow in hydraulic works 4,5.
In piping systems, head losses are analyzed by the universal Darcy-Weisbach equation. However, the implicit friction factor (f) intervenes in the equation. In this sense, Colebrook 6 proposes an equation that is currently the best approximation of the friction factor, especially for turbulent flow 7. Nevertheless, its calculation is complex and cumbersome because the friction factor is present at both ends of the equation. In addition, its solution needs more time and processing in calculators. Therefore, its solution requires using iterative methods such as the Newton-Raphson approximation method. Although diagnostic and control algorithms are implemented in the mathematical modeling of hydraulic systems, precise parameter tuning is necessary.
Several authors 8-15) have developed explicit approximations of the friction factor as an alternative to the Colebrook equation, but the explicit models developed differ in their accuracy and computational efficiency 16-19. The work presented by 20 highlighted that the equation by 21 was more accurate than the Colebrook equation for the experimental data in their research. On the other hand, 22 cite that the equations by 16 and 23 are the most efficient, with a maximum-recorded error of 0.18% and 0.54%, respectively. Likewise, 24 propose that the equations available in the literature lead to a deviation of between 2% and 3% for a turbulent flow with a Reynolds number of 2300. In turn, they suggest a new equation based on the relationship between friction forces and viscous forces to determine f with a maximum standard deviation of 0.25% with respect to the Colebrook equation.
There have been significant contributions in recent years to predicting the friction factor value with artificial intelligence approaches such as Gene Expression Programming (GEP), Evolutionary Polynomial Regression (EPR), Adaptive Neuro-Fuzzy Inference System (ANFIS), Artificial Neural Network (ANN), and physical and numerical models that manage to predict the fluid behavior in different media 25-28. In particular, 26 estimated f using Bayesian learning neural networks and reached a relative error of 0.0035%. Furthermore, 29, using some artificial intelligence approaches, reached mean absolute errors of 0.001%. In this sense, 30 cite some gaps in the artificial intelligence technique, such as the data set, the layers of predesignated neurons, the percentage of training, and the test in the model tree. However, increasing the number of variables and implicit functions of the friction factor is necessary. Likewise, there is still a need to insert model selection criteria.
Many authors tend to use the Mean Squared Error (MSE), Mean Relative Error (MRE), Mean Absolute Error (MAE), Standard Deviation (SD), and Relative Error (RE) 7,22, and 31. This has several disadvantages when compared to other models since the value of R is more significant when the number of variables in the mathematical model increases 32. Therefore, the value R can be increased, and the models can become more complex.
There are several techniques to adjust the training error for model sizes, such as Model Selection Criteria (MSC), Akaike Information Criterion (AIC) 33, Bayesian Information Criterion (BIC) 34, and Mallows' Cp Criterion 35. The MSC and AIC have applied for the best prediction model, but there have been limits: 4000<Re<108 and 10-6 <e/D<5·10-2 (12,36, discrepancies in the results. The selection is important because the decision of the criterion could affect the interpretation of the variable as well as its prediction. Thus, the following hypothesis is proposed in the present study: the explicit friction factor equations can be classified, and the GEP can provide a new equation with a minimum error. In this sense, the goal of this work was to perform an evaluation, classification, and a new suggested explicit pipe friction factor equation with the least amount of error.
2. Materials and methods
The Colebrook equation is the most cited, accepted, and validated equation in fluid dynamics studies for obtaining friction losses in pipes.
It relates, in its implicit form, to the unknown friction factor (f), the relative roughness (e/D), the known pipe inner surface area, and the known Reynolds number (Re). Valid for 4000< Re<108 y 0<e/D<5·10-2, as shown in Equation 1. However, Equation 1 requires some mathematical iterations to get the optimal solution.
Where f is the implied friction factor (f), e is the absolute roughness of the pipe's inside wall, D is the pipe diameter, and Re is the Reynolds number.
Nonetheless, there are several explicit approaches reported in the scientific literature to calculate the friction factor, as shown in Equations 2 to 36.
14 Model I
14 Model II
Where β is:
Where S is:
11 Model I.
11 Model II.
The Colebrook equation and the 30 explicit equations found in the scientific literature were evaluated for different conditions of relative roughness (e/D) from 10-6 to 5·10-2 and the Reynolds number from 4000 to 108, which implied a base of 47601 data points. The analysis interval integrates the onset of turbulence and complete turbulence to test the best behavior of the correlations in the mathematical formulations.
In this study, the Newton-Raphson method was used in Colebrook Equation 1 by the Python algorithm. The method has been generalized due to its simplicity and speed of convergence to solve nonlinear problems, systems of equations, and nonlinear differential and integral equations 23.
Similarly, Gene Expression Programming (GEP), implemented in GeneXpro software, was applied, after obtaining the evaluation, classification, and generation of the most suitable equations. Initially, the database composed of 47,601 variables was used to select the best adjustment according to their fitness and introduce genetic variation using genetic operators.
Additionally, the procedure for estimating the pipeline friction coefficient using GEP involved fitness function selection, choice of T-termini and F-functions to create chromosomes, choice of chromosome architecture, choice of linkage function, and choice of genetic operators.
The 30 Chromosomes were executed, with a head size of 8 and the number of genes 1, 2, 3, and 6; linking functions (+, -, *, /); and mathematical functions divided into GEP1, GEP2, GEP3, and GEP4 in +, -, /, ., √x, e x , log 10, 10 x , x 1/3, x 1/4, x 1/5, x 2, x 3, x 4, x 5, x 1/x .
In the investigation, percent standard deviation (PSD) and Equation 37 Maximum Relative Error (∆f/f) were used as criteria for the accuracy of the explicit models.
Additionally, efficient methods of model comparison and selection based on model complexity were applied. Model Selection Criteria (MSC) 29 and Akaike's Information Criteria (AIC) were used 26. These criteria expressed by Equations 38 and 39 are based on the greatest likelihood and smallest parameters, and the variables follow a normal distribution.
Where f CW is the true value of the Colebrook-White (CW) friction factor, f proposed is the value of the proposed friction factor, p is the number of equation parameters including constants, i = 1,… n is the number of friction factor values, and n is the sample size.
3. Results and discussion
Figure 1 shows the accuracies of the explicit models according to the Maximum Relative Error (∆f/f) and percent standard deviation (PSD). Figure 1 a) shows that the (∆f/f) values ranged from 0.082% to 38.435%, and 43% of the equations had values lower than 2.0% of the Maximum Relative Error. Group I is the most efficient approximation where the Maximum Relative Error is less than 1%; therefore, those are recommended for precision engineering work. In Group I, the results are outstanding, presenting values ∆f/f < 0.5% by the equations of 12, model I 11,16,31, and 17,23,24,40. In particular, the equations by 13,39, and 50 have 0.5 < ∆f/f < 1%.
Other authors have formulated new, noteworthy, accurate equations; these are classified in group II because they have a Maximum Relative Error of less than 2%, which are those proposed for model II by 11 and 45.
Group III was classified as having a lower approximation to Colebrook's with a Maximum Relative Error between 2.587 ≤ ∆f/f ≤ 8.303, as equations cited by 21,46, model II by 10,14,38,44,51, and model I by 14. However, the equation by 21, according to 20 in their research, was the most accurate. Possible causes were that 20 only used 2397 experimental points, 3000 ≤ R e≤ 735-103, and 0 < (/D <1.4-10-3. Nonetheless, group IV had to be rejected because they exceeded ∆f/f > 10%, as are 9,42,47,48,49,52, and 37. In particular, the equation proposed by 9, at the time provided significant results for solving problems, but it is shown that new and more accurate formulations have been developed.
Results that agree with those obtained by 22, who evaluated 33 equations in a range of the Moody diagram with 2300 ≤ Re ≤108, 0 <(/D<5-10-2 and in relation to the equation proposed by 9the error test was high, exceeding 10%. Similarly, it agrees with the results by 29 on the mathematical models analyzed using Machine Learning tools in which 9 and 42 had the most unfavorable equations.
Regarding Figure 1 b) and the Percent Standard Deviation (PSD), it is observed that, in general, the 30 equations analyzed presented a deviation between 1.2%<PSD<2%. However, 81% of the equations had a stable standard deviation between 1.5% and 1.6%. Nevertheless, there are three equations of approximations with the lowest standard deviation, such as 9,48 and 37, but they presented a high relative error for which they were rejected.
The 30 equations analyzed in this article have two perspectives: firstly, the equations with a high number of parameters tend to be more accurate, and secondly, the equations with the least number of parameters are less accurate. On the other hand, the engineer needs the easiest and most accurate equation for friction factor calculation, according to 24. In summary, as a result of the increasing digitization of work, educational and economic environments, the equations must be formulated with the highest precision and best computational performance.
For this reason, the MSC and AIC Model Selection Criteria have been implemented using a Ranking because it considers a decisive variable as the number of parameters, including the constants in the equations (p).
Based on the accuracies of the models, a preliminary model ranking (Rk) was proposed for each evaluation criterionp, ∆f/f, PSD, MSC, and AIC, and finally, a Global Ranking. Table 1 shows the results of the models. It is observed that the error theory and theoretical functions show results that differ in their rank order for each equation, with a discrepancy in optimal model selection. Equation 5, proposed by 37 is the simplest and has the least number of steps to obtain the friction factor. Nevertheless, in the previous analysis, it was rejected because of its high relative error, which is positioned at number 30. Meanwhile, Equation 11 by 17 is classified as the most complex for its solution due to the number of steps and parameters it includes. However, it was classified in group I with a relative error of less than 0.5% and an acceptable deviation of less than 1.6%, with a ranking of 8.
In this sense, MSC and AIC contributed to the selection of the best model. However, in both cases, they present discrepancies with respect to the function of greater likelihood and entropy. The MSC value indicates that by 49 equation occupies rank 1, while the MSC value of the 11) equation model I occupies rank 30. In relation, the AIC reached inversely proportional values, the 49) equation reached rank 30 and 11 equation model I has rank 1. On the other hand, in contrast to the previous equations, the number of parameters by 11) equation model I is 47% higher than by 49) equation. Consequently, it can be pointed out that the AIC criterion does not follow the parsimony principle because the smaller the number of parameters, the smaller the AIC tends to be.
It should be noted that the AIC criterion does not follow the principle of parsimony. In summary, there is a tendency for the AIC criterion to improve as the number of parameters increases; these factors contradict the theories for which the AIC criterion was defined. In finite samples, the AIC value is only approximate 33. Therefore, difficulties could arise regarding the validity and applicability of the method for this purpose.
Additionally, the MSC criterion also showed inconsistencies between the models due to the number of parameters; however, this coincides with the results of the AIC criterion. This trend in the results corresponds with those results obtained by 36.
The global ranking obtained in Table 1 integrates the positions of the most accurate and inaccurate approximation models with their degrees of complexity. The explicit Equation 32 proposed by 16) leads the Global Ranking in the first position as the most accurate, followed in second place by Equations 29, 26, and 22 by (31, 50), and 46. The least accurate and most complex to solve are Equations 34, 23, 28 by 47,52, and 49, which in turn belong to the rejected group IV.
Table 1 Preference models
| Authors | No. equations | p | Main statistics | Model selection criteria | Global Ranking | ||||
|---|---|---|---|---|---|---|---|---|---|
| Parameter | ∆f/f | PSD | MSC | AIC | Total | Global | |||
| No | Rk | Rk | Rk | Rk | ∑Rk | GR | |||
| 21 | 2 | 11 | 14 | 20 | 17 | 14 | 76 | 14 | |
| 24 | 3 | 17 | 6 | 14 | 28 | 3 | 68 | 7 | |
| 37 | 5 | 6 | 30 | 3 | 3 | 28 | 70 | 9 | |
| 38 | 6 | 14 | 18 | 25 | 10 | 21 | 88 | 20 | |
| 39 | 7 | 19 | 9 | 13 | 23 | 8 | 72 | 11 | |
| 14) I | 8 | 18 | 21 | 18 | 11 | 20 | 88 | 20 | |
| 14 II | 9 | 19 | 16 | 22 | 15 | 15 | 87 | 19 | |
| 17 | 11 | 39 | 5 | 8 | 25 | 6 | 83 | 18 | |
| 40 | 14 | 16 | 7 | 11 | 24 | 7 | 65 | 5 | |
| 41 | 15 | 9 | 23 | 27 | 8 | 23 | 90 | 22 | |
| 42 | 16 | 19 | 24 | 5 | 5 | 27 | 80 | 17 | |
| 43 | 17 | 8 | 22 | 28 | 9 | 22 | 89 | 21 | |
| 13 | 18 | 14 | 10 | 16 | 21 | 10 | 71 | 10 | |
| 23 | 19 | 13 | 8 | 17 | 22 | 9 | 69 | 8 | |
| 44 | 20 | 10 | 17 | 4 | 12 | 19 | 62 | 3 | |
| 45 | 21 | 10 | 13 | 19 | 18 | 13 | 73 | 13 | |
| 46 | 22 | 10 | 15 | 2 | 16 | 16 | 59 | 2 | |
| 47 | 23 | 12 | 28 | 29 | 6 | 25 | 100 | 24 | |
| 12 | 24 | 21 | 1 | 12 | 29 | 2 | 65 | 5 | |
| 9 | 25 | 8 | 26 | 2 | 4 | 26 | 66 | 6 | |
| 31 | 26 | 15 | 3 | 10 | 27 | 4 | 59 | 2 | |
| 48 | 27 | 7 | 27 | 1 | 2 | 29 | 66 | 6 | |
| 49 | 28 | 8 | 25 | 30 | 1 | 30 | 94 | 23 | |
| 50 | 29 | 11 | 11 | 6 | 20 | 11 | 59 | 2 | |
| 51 | 30 | 9 | 19 | 23 | 14 | 17 | 82 | 15 | |
| 10 | 31 | 8 | 20 | 24 | 13 | 18 | 83 | 16 | |
| 16 | 32 | 13 | 4 | 9 | 26 | 5 | 57 | 1 | |
| 52 | 34 | 16 | 29 | 26 | 7 | 24 | 102 | 25 | |
| 11) I | 35 | 17 | 2 | 15 | 30 | 1 | 65 | 5 | |
| 11) II | 36 | 14 | 12 | 7 | 19 | 12 | 64 | 4 | |
Consequently, in Table 1 an easier classification has been established, according to the level of precision and simplicity for the first five global rankings. It was established from a very high level, which indicates excellent precision and simplicity, to a very low level, which is interpreted as an inaccurate and complex equation to solve due to the number of operations and parameters present.
As a new proposal for explicit friction factor approximation equations, 64 models were analyzed in Gene Expression Programming (GEP). The theoretical and experimental databases were developed as a training process to train the GEP algorithm. Twenty percent of the data was reserved for validation and the rest for calibration. Only the most efficient results of GEP1, GEP2, GEP3, and GEP4 according to the performance criteria are reflected in Table 3.
Table 3 shows that the most significant models had Linking Functions + and *, a Number of Chromosomes of 30, a Head Size of 8, and a Number of Genes of 2 and 6. The best-performing model was GEP1, with the lowest number of functions (4), and 7 parameters including constants. The Root Mean Square Error (RMSE) was 0.078%, the Mean Absolute Error (MAE) was 0.055%, the Pearson correlation coefficient (R) was 0.99873, the ∆f/f was 6.22%, and the PSD was 1.86%.
In contrast to the groups made in Figure 1 due to the maximum relative error, GEP1 was classified in group III because it was within the interval 2.5 ≤ ∆f/f ≤ 8.3, this being an alternative to obtain the friction factor quickly and easily.
Although GEP4 has the highest R and a lower ∆f/f, PSD, it is shown to be more significant for having a greater number of functions, according to 24. In addition, the GEP4 model has a greater number of operations for its solution, making it less simple. Regarding the increase of functions, the Number of Chromosomes, Head Size, and Number of Genes showed a partial relationship to the results obtained by 51 that the GEP models increase with increasing functions.
Equation 40 is proposed as a new nonlinear model to determine the explicit friction factor coefficient with the lowest error without the existence of logarithmic functions, speed of calculation, or more accurate approximation in the turbulent flow regime. The Limit: 4000 < Re < 108 and 10-6 <(/D < 10-2.
Table 3 Efficient model of the GEP
Conclusions
Thirty explicit friction factor equations were analyzed on a base of 47601 theoretical and experimental data points and according, to the maximum relative error (∆f/f), were classified into 4 groups: group I of 0.5% < ∆f/f, group II of 0.5% < ∆f/f < 1%, group III of 1% < ∆f/f < 2% and group IV ∆f/f > 2%. Group I includes the most accurate explicit friction factor equations, developed by 12, model I 11,16,17,24,31,40 and 23. In general, the Percentage Standard Deviation (PSD) was acceptable and comprised between 1.2%<PSD≤1.9%.
The MSC and AIC selection criteria contributed to the selection of the most accurate equations to estimate the friction factor, but they presented a discrepancy in likelihood and entropy. However, the number of parameters and operations of the equations (p) was a decisive variable in obtaining the global ranking of the 30 friction factor equations explicit in Table 2. In summary, the first five global rankings were classified by the most accurate and simple equations. Therefore, it was concluded that the estimates of the equation by 16) ranked very high in accuracy and simplicity for obtaining explicit friction factors. The 50 and 31 equations also presented very high performance. In contrast, the use of the equations developed by 47,52, and 49 is not recommended, and in the case of their use, they should be under specific conditions because they can produce inaccurate results. This new approach made it possible to observe that, under certain conditions, the Colebrook equation is not the most accurate at present.
With the GEP, it was possible to provide a new model to determine the explicit friction factor f (R, (/D) with the lowest degree of complexity in the turbulent flow regime. It has an RMSE of 0.078%, an MAE of 0.055%, and an R of 0.99873. Compared with the Colebrook equation, it has more simplicity, fast convergence, less computational time, and a good relationship between accuracy and computational efficiency.
From the analyzed equations of the explicit friction factor for turbulent flow, it was found that there are new equations with optimal efficiency indicators for the original equations that are cited, such as those by 9,10, and 51. In this regard, it is recommended to consider the mathematical models' new functions as more accurate explicit approximations.
The main finding of the research developed is the integration of statistical tools, Python algorithms, Genetic Expression Programming, and the new model proposed for obtaining the level of complexity and effectiveness of the explicit friction factor equations of the Colebrook equation. Likewise, novel information would ease the elaboration and decision-making of hydraulic engineering projects. In response to the previous conclusion, it is recommended to extend the analysis methods with artificial intelligence and new criteria for the selection of mathematical models.
























































