Around the world, more than 5 million hectares are cultivated with tomatoes (Solanum lycopersicum), from which approximately 171 million tons of fruits are harvested, totaling 33.98 t ha-1 (FAOSTAT, 2018).
These data prove the great economic and social im portance of this crop. In Brazil, the cultivated area in the main producing regions totaled 37,398 ha in 2016, of which 18,674 were for table and 18,724 for indus try (Kis and Carvalho, 2017). Globally, 4,782,753 ha are cultivated, with production reaching 17,7042,359 t (FAOSTAT, 2018).
Usually, when working with the tomato crop, a large number of variables are measured in order to obtain a set of data that allows the most varied types of evalu ations and statistical analyses. When numerous vari ables are studied at the same time, correlations can be calculated between them, which are important for the selection of characteristics of interest for plant breeding (Moreira et al., 2013); for this, Pearson's correlation is used. However, care must be taken be cause, in many cases, it may not be a real measure of cause and effect, leading to misunderstandings dur ing the interpretation of the data. In this way, path analysis is a statistical analysis capable of recognizing cause and effect relationships (Wright, 1921), unfold ing the correlation coefficients in direct and indirect effects of independent variables on a dependent vari able. According Rafiei and Saeidi (2005), this meth od is more coherent than simple linear correlations because they do not provide accurate information on each characteristic.
In this type of cause and effect relationship analy sis, the presence of multicollinearity between the explanatory variables is common, and this factor may cause misunderstandings in the interpretation of the results (Cruz et al., 2012; Olivoto et al., 2017). Thus, it is extremely important to perform the multicollinearity diagnosis before carrying out the path anal ysis in order to obtain more accurate estimates of the direct and indirect effects on the studied dependent variable (Lúcio et al., 2013; Toebe and Cargnelutti Filho, 2013).
In the Stepwise method for selecting variables, the selection procedure is performed automatically with statistical packages, selecting a model with variables that explain the behavior of the dependent variable and that can be used to select variables that cause multicollinearity in the regression analysis linear (Zhang, 2016). Criteria for selecting variables include adjusted R-squared, Akaike information criterion (AIC), and Bayesian information criterion (BIC), among others (Hocking, 1976; Hosmer et al., 1989).
In this context, the objective of this study was to identify and estimate the relationships between the variables of production components and the total productivity of tomato fruits.
This study was conducted at the Federal University of Santa Maria (UFSM), Campus Frederico Westphalen (27°23' S, 53°25' W and 493 m of altitude) over two years of cultivation (2012 and 2013). According to the classification of Kõeppen, the region's climate is Cfa, humid subtropical, with an annual average precipitation of 1,800 mm well distributed through out the year and subtropical from the thermal point of view (Alvares et al., 2013).
The soil preparation for plant cultivation was carried out with the conventional system. For the planting of the seedlings, grooves with a depth of approximately 20 cm were made, with a basic fertilization sequence, according to the soil analysis and the recommenda tion of the Soil Chemistry and Fertility Commission (CQFSRS/SC, 2004).
The hybrid seedlings of the Italian type group (Netuno and San Vito) were produced in polystyrene trays with 128 cells in a greenhouse. Carolina® com mercial substrate was used, and, after sowing, the trays were kept in a floating system. The transplant occurred on September 4, 2012 and January 26, 2013, when they presented five definitive leaves. Spacings of 1.0 m between rows and 0.5 m between plants was used. The plants were trained vertically on a single stem with a wire when they reached 15 cm in height. Drip irrigation was used to meet the water require ments of the crop. During the cycle, the shoot leaves were removed every 2 d, and cover fertilization was carried out every 10 d, according to the soil analysis and recommendations for cultivation.
The experiment was conducted in a 2x3x3 random ized complete block design, the factors being two hybrids (Netuno and San Vito), three doses of boron (H3BO3 - 0, 2, 4 g/pit) and three frequencies of floral calcium applications (CaCl2 at 0.6%) (no application; application every 7 d; and application every 14 d) (Plese et al., 1998), totaling 18 treatments with four replicates and 20 plants per plot.
At 60 d after transplanting the seedlings, when all plants had produced the seventh floral cluster, the height (ALT, in cm) of the plants was measured from the base to the apex using a tape measure. The fruits were harvested when they had yellowish spots. Af ter harvesting, the fruits were counted and weighed daily using a digital scale, resulting in the following variables: total number of fruits per plant (NTF); total mass of fruits (MTF, g/plant); average fruit mass (MMF, g/plant); commercial fruit mass (MFC, g/plant) and non-commercial fruit mass (MFNC, g/ plant); number of commercial fruits (NFC) and non commercial fruits (NFNC); mean fruit diameter (D, cm) measured with a caliper; and total productivity (PROD, g).
In order to evaluate the relationships between the variables, Pearson's linear correlation coefficients were estimated. In order to perform path analysis without multicollinearity, the selection of the ex planatory variables was done with the Stepwise method, and the following variables were selected: NTF, MTF, D, NFC and NFNC. After the selection of the variables with Stepwise, a multicollinearity diag nosis was performed between the explanatory vari ables through the analysis of the condition number
which represents the ratio of the largest to the small est eigenvalue of the correlation matrix, and the vari ance inflation factor
, where R
the coefficient of determination is; after diagnosis, NFNC was excluded because there was high multicollinearity.
The path analysis was carried out with the Pearson correlation matrix, using the productivity variable (PROD) a. s the dependent variable. The path analy sis coefficients were obtained with the methodology proposed by Cruz et al. (2012) using equation:
were Y is the coefficient of the dependent variable; Po is the direct effect coefficient; X is an explanatory independent variable; Pu is the residual effect and the standardization variable.
Statistical analyzes were performed at 5% signifi cance, with the MASS and agricolae packages avail able in the R program (R Core Team 2017).
The Pearson correlation analysis between the vari ables revealed several significant correlations. For the year 2012, the values showed a weak ALT correlation with the other variables. However, as expected, the NTF variable presented a strong correlation with the variables MTF, NFC, MFC (0.98, 0.93 and 0.91, respectively) and a negative correlation with the variables NFNC and MFNC (-0.65 and -0.62 respec tively). The variable MTF was positively correlated with four of the ten variables: NTF, NFC, MFC and PROD (0.98, 0.94, 0.94 and 1.00, respectively) and negatively correlated with NFNC and MFNC (-0.69 and -0.66, respectively) (Tab. 1).
Table 1 Estimates of the Pearson correlation coefficients for the variables height (ALT, cm), total number of fruits per plant (NTF), total mass of fruits (MTF, g/plant), average fruit mass (MMF, g/plant), commercial fruit mass (MFC, g/plant) and non-commercial (MFNC, g/plant), number of commercial fruits (NFC) and non-commercial (NFNC), and mean fruit diameter (D, cm) total productivity (PROD, g) of two Italian tomato cultivars produced under different doses of boron and calcium application, in 2012 (upper diagonal) and 2013 (lower diagonal).
Variables | ALT | NTF | MTF | D | NFC | NFNC | MFC | MFNC | PROD | MMF |
ALT | - | 0.03 ns | 0.04 ns | 0.18 ns | 0.05 ns | -0.06 ns | 0.07 ns | -0.09 ns | 0.04 ns | 0.08 ns |
NTF | 0.22 ns | - | 0.98* | 0.25 ns | 0.93* | -0.65* | 0.91* | -0.62* | 0.98* | 0.28 ns |
MTF | 0.31 ns | 0.88* | - | 0.26 ns | 0.94* | -0.69* | 0.94* | -0.66* | 1.00* | 0.46 |
D | -0.16 ns | 0.02 ns | 0.22 ns | - | 0.36 ns | -0.41* | 0.35 ns | -0.40* | 0.26 ns | 0.14 ns |
NFC | 0.09 ns | 0.77* | 0.84* | 0.25 ns | - | -0.89* | 0.98* | -0.84* | 0.94* | 0.40* |
NFNC | 0.15 ns | -0.14 ns | 0.14 ns | -0.40 | -0.53* | - | -0.87* | 0.94* | -0.69* | -0.47* |
MFC | 0.18 ns | 0.69* | 0.89* | 0.37 ns | 0.95* | -0.55* | - | -0.87* | 0.94* | 0.48* |
MFNC | 0.21 ns | -0.13 ns | 0.07 ns | -0.42 | -0.49* | 0.94* | -0.51* | - | -0.66* | -0.43* |
PROD | 0.31 ns | 0.88* | 1.00* | 0.23 ns | 0.84* | -0.14 ns | 0.89* | -0.07 ns | - | 0.46* |
MMF | 0.28 ns | 0.10 ns | 0.56* | 0.48* | 0.41* | -0.50* | 0.63* | -0.34 ns | 0.56* | - |
The variable NFC presented a significant and positive correlation with NTF, MTF, MFC, and PROD (0.93, 0.94, 0.98 and, 0.94 respectively), but was negatively correlated with NFNC and MFNC (-0.89 and -0.84. respectively). NFNC showed a positive and strong correlation with MFNC (0.94) and, with the other variables, had a negative correlation, indicating that the higher the number of non-commercial fruits, the lower the production in general. The same was ob served for the variable MFNC. The MFC and PROD variables presented a strong correlation with the NTF (0.91 and 0.98), MTF (0.94 and 1.00) and NFC (0.98 and 0.94) variables, but was inversely proportional to NFNC (-0.87 and -0.69) and MFNC (-0.87 and -0.66) (Tab. 1).
For the year 2013, the relationships between the vari ables followed the same trend as 2012; however, they were moderately smaller, but still significant. Con trary to 2012, the variable MMF presented a higher correlation with the variables MTF, NFC, and MFC (0.56, 0.41 and 0.63, respectively) (Tab. 1).
For the two years (2012 and 2013), the Stepwise method selected the NTF, MTF, D, NFC and NFNC variables for path analysis. However, the use of these variables caused a number of conditions (NC), and the values of inflation of variance (VIF) were high; thus, the NFNC variable was discarded from the analysis, and this action corrected the multicollinearity problem. Thus, the NC values were 165 and 30 for the years of 2012 and 2013, respectively. The VIF values for the year 2012 were 27.58, 31.30, 1.20 and 9.55. For the year 2012, the multicollinearity was moderate; however, in this case, this value did not imply serious problems because they are not much above those indicated. For the year 2013, the VIF val ues (5.55, 7.03, 1.37 and 3.64) were low.
In the year 2012, the decomposition of linear correla tions in direct and indirect effects presented a coeffi cient of determination of 94% and a very low residual effect, proving that the selected variables explained a large part of the observed variation. The negative and negligible effect of the direct effect of the NTF (-0.000019) and the positive and negligible effect of the direct effect of the NFC (0.000002) variables and the strong and positive correlation revealed that the correlation in the indirect effects explained the vari able PROD. In the variable MTF, in which the direct effect was high and positive (1.000018), the correla tion was also high and positive (1.00); the variable PROD was fully explained by the variable MTF; that is, it had a cause and effect relationship between the variables. On the other hand, variable D had no cause and effect relationship with the main PROD variable (Tab. 2).
For the year 2013, the decomposition of linear cor relations into direct and indirect effects showed a 99% coefficient of determination and a low residual effect, so the selected variables explained a large part of the main PROD variable. The results were similar to those of 2012, with only minor changes related to the intensity of the effect. The negligible effect of the direct effect of the NTF (0.00219) and NFC (0.00142) variables and the strong and positive correlation re vealed that the indirect effects explained the variable PROD. The variable MTF demonstrated a cause and effect relationship with the PROD variable as seen in 2012, and variable D had no direct or indirect effects on the main PROD variable (Tab. 2).
Table 2 Path analysis evaluated in two cultivars of Italian tomato submitted to doses of boron and calcium applications, invol ving the dependent variable productivity (PROD) and the explanatory independent variables, with the split of Pearson's correlations in components of direct effect (main diagonal, underline) and indirect (in the line) for the years of 2012 and 2013.
2012 | |||||
Characteristic | NTF | MTF | D | NFC | r |
NTF | -0.000019 | 0.981312 | -0.000002 | 0.000002 | 0.98 |
MTF | -0.000018 | 1.000018 | -0.000002 | 0.000002 | 1.00 |
D | -0.000005 | 0.261503 | -0.000008 | 0.000001 | 0.26 |
NFC | -0.000017 | 0.938043 | -0.000003 | 0.000002 | 0.94 |
Residual effect | 1.17x10-9 | ||||
R 2 | 0.94 | ||||
2013 | |||||
NTF | 0.00219 | 0.87185 | -0.00002 | 0.00109 | 0.88 |
MTF | 0.00192 | 0.99623 | 0.00026 | 0.00120 | 1.00 |
D | -0.00004 | 0.22384 | 0.00115 | 0.00035 | 0.23 |
NFC | 0.00169 | 0.84073 | 0.00028 | 0.00142 | 0.84 |
Residual effect | 7.7x10-4 | ||||
R2 | 0.99 |
NTF: total number of fruits per plant; MTF: total mass of fruits; D: mean fruit diameter; NFC: number of commercial fruits; r: Pearson's correlation coefficient.
The variables ALT, D, MMF did not present a signifi cant correlation with the other variables. Similarly, Sari et al. (2017) evaluated the linear relationships between cherry tomato characteristics and found a weak linear effect from the variables fruit length per plant, average fruit width per plant and average fruit weight per plant on the variables number of bunches per plant, number of fruits per plant and number of fruits per bunches.
In a study carried out by Kumar et al. (2013), a posi tive relationship was observed between fruit yield per plant and number of fruits per plant. Contrary to what was found in the present study, the same authors observed that fruit weight showed a strong correlation with fruit length and fruit diameter. The relationships between characteristics may change ac cording to the cultivar or management adopted for the crop (Fallahi et al., 2017). In this way, the type of management adopted in the present study, different doses of boron and calcium applications, may have interfered with the relationships between the charac teristics since boron actively participates in floral in duction and maintenance of flowers in plants (Perica et al., 2001), leading to higher fruit yields and, conse quently, higher yields. In addition, boron participates in the formation of the pollen tube and fruit forma tion, which contribute to the non-appearance of de formed fruits, thus reducing the number and mass of non-commercial fruits. In this study, it was possible to observe that, when there is an increase in positive variables, a higher fruit production results in a nega tive correlation with the variables NFNC and MFNC.
Calcium is an essential macronutrient for plants be cause it actively participates in extracellular stim uli and intracellular responses that control a large amount of endogenous processes (Edel et al., 2017). In addition, it has role in cell wall structure and acts on factors that control the development of plants and responses to biotic and abiotic stresses (González-Fontes et al., 2017). This nutrient is also important for plant growth and fruit yield (Bastías et al., 2010). Thus, when tomato plants are under stress or suffer a nutrient deficiency, linear relationships between vari ables may be different.
The high values of the coefficient of determination (R2) and the low value of the residues indicated that there was precision in the estimates of the direct and indirect effects (Rios et al., 2012; Donazzolo et al., 2017). In the present study, the low values of NC and VIF may also have contributed to high R2 values and low residual values in both evaluated years.
The results obtained from the path analysis were in terpreted in the same way as Lúcio et al. (2013). That is, when the Pearson correlation coefficient was posi tive and indirect effects caused the direct effect nega tive or low, the correlation. Otherwise, if the Pearson correlation coefficient was low and the direct effect positive and high, the indirect effects were respon sible for the lack of correlation. And when Pearson's correlation was negative and the direct effect positive and high, the indirect effects were eliminated from the analysis, and only the direct effect was conidered.
The use and interpretation of Pearson's correlation alone between variables can cause biased results since it does not employ the direct and indirect effects. So, the path analysis provides more reliable results that surpass the limitations of Pearson's correlation (Cruz et al., 2012). This was observed in the present study, that is, when the correlation between PROD and NTF was high this coefficient was used in the direct and indirect effects, it was observed that the NTF variable did not influence the PROD variable through cause and effect.
In both evaluated years, the path analysis revealed a cause and effect relationship between the MTF independent variable and the PROD dependent vari able. This relationship was already expected since it is through the total mass of fruits that productivity is determined. Contrary to the present study, Sari et al. (2017) evaluated the linear relationships between characteristics of cherry tomatoes and concluded that production is directly related to the number of fruits produced and that the individual weight of each fruit has little influence on total production. In the present study, NTF presented low influence on the basic variable PROD. Rodrigues et al. (2010) ob served that the variables mean weight of fruits and total number of fruits had high magnitudes of direct and indirect effects on total fruit production in salad-type tomatoes.