Introduction
Cholelithiasis is a globally common surgical entity affecting between 10% and 15% of the general population1. One of its typical complications is choledocholithiasis, defined as stones in the common bile duct (CBD). There are two forms of choledocholithiasis: primary and secondary. On the one hand, primary choledocholithiasis refers to the on-site formation of stones in the CBD; this form corresponds to 10% of cases and is more frequent in Asian countries2. On the other hand, the secondary form (90%) occurs by the direct migration of stones from the gallbladder. This type of choledocholithiasis has a variable manifestation between 4.6% and 20.9% of patients with cholelithiasis in different contexts1,3-6. The disease prevalence and burden in Colombia are unknown, although some studies estimate that about 11.9% of patients admitted to the emergency services with acute cholecystitis have choledocholithiasis7.
Secondary choledocholithiasis represents a tremendous economic burden for health care systems. For example, in the United States, it costs approximately $6.6 billion annually, including the cost of cholecystectomy as a definitive treatment8. Some studies available in Colombia report on the cost of this disease for institutions, but little is known about its economic impact on the health care system7,9,10.
The diagnosis and management of secondary choledocholithiasis have undergone significant transformations in recent decades, mainly the use of risk stratification models developed based on the interaction of clinical, biochemical, and diagnostic imaging criteria. The availability of non-invasive diagnostic tests such as nuclear magnetic resonance cholangiography (NMRC), computed axial tomography (CAT) cholangiography, and endoscopic ultrasound (EUS) have changed the diagnostic approach to this entity, as well as the refinement and increase of endoscopic techniques for its treatment by endoscopic retrograde cholangiography (ERCP) concerning surgical procedures that entail greater morbidity, hospital stay, and care costs. Additionally, predictive models and new non-invasive technologies have reduced ERCP complications for diagnostic purposes up to 10%11.
However, the prediction of secondary choledocholithiasis remains a field of permanent discussion due to the persistence of limitations inherent to the available models, including:
The interaction of the variables included in the models
The validation of the models available in different practice contexts and populations
The criteria and mechanisms for determining thresholds between different levels of risk in the available models
The applicability of generic prediction models in special populations
The impact of prediction models on the cost-effectiveness of diagnosis and treatment
This study exposes these knowledge gaps and offers potential questions for future research.
Variables included in prediction and interaction models
Traditionally, the risk of choledocholithiasis has been determined from clinical variables, liver biochemical profiles, and ultrasound findings. The main variables related to the presence of stones in the CBD, given their strong association, have been the elevation of bilirubin levels, with reported cut-off points between 1.3 and 4.0 mg/dL (sensitivity: 84%, specificity: 91%)1,3,12-14, bile duct dilation by ultrasound > 6-8 mm (sensitivity: 73%; specificity: 91%)3, and the elevation of other biochemical tests such as alanine-aminotransferase (ALT), aspartate-aminotransferase (AST), γ-glutamyltransferase (GGT), and alkaline phosphatase (ALP), which offer an isolated sensitivity and specificity of 81% and 25%, respectively3,14,15. Finally, some clinical manifestation variables have been included in the diagnostic approach, such as the age of the patient older than 55 years (sensitivity: 57%, specificity: 49%), biliary pancreatitis (sensitivity: 7%-22.3%; specificity: 75%), cholangitis (sensitivity: 21.7%; specificity: 99%)1,11, and jaundice (sensitivity: 39%, specificity: 92%), as isolated risk factors15,16.
Various prediction models for choledocholithiasis in patients with cholelithiasis have been developed from these variables. Most models derive from statistical methods, including multivariate logistic regressions, based on univariate factors identified in previous studies. The prediction models are intended to stratify patients into different risk groups on which the need for diagnostic methods and treatment strategies is determined17,18. Some models categorize the patient into two risk groups (low and high)19, others in three groups (low, medium, and high)3,5,20, and others in more specific subgroups (low, medium-low, medium-high, high)7,12,21,22. Some examples of this type of model are those of the American Society for Gastrointestinal Endoscopy (ASGE)5,20 and the European Society of Gastroenterology and Endoscopy (ESGE)3.
The model developed by ASGE shows a diagnostic yield close to 70% in its 2010 version and is frequently used by surgeons and gastroenterologists worldwide23. Nonetheless, it has been widely criticized mainly due to the absence of statistical data that validates significant population samples in clinical settings, limiting the data to analyze operating characteristics defined in previous studies and expert opinions. Some studies have sought to validate this model in various contexts1,7,11,18,23 and its comparison with previous models15,16,24. Still, their results are heterogeneous and do not allow for an adequate analysis of the information. Currently, there are no meta-analyses of the available studies to make solid recommendations on the applicability of this model in a general way.
The ESGE model is also widely used worldwide. Its development was like ASGE’s, based on literature reviews and expert consensus. The model includes biochemical, clinical, and ultrasound variables to stratify three risk groups (high, medium, and low)3. The operating characteristics of the model are superior to that of ASGE in some studies (sensitivity: 89.57%; specificity: 96.24%)25. However, these characteristics in specific groups, such as patients with acute cholecystitis, have shown low diagnostic yield with a sensitivity close to 49.2% and specificity of 87.3%26, which poses the same validation challenges as ASGE’s.
Lastly, alternative models have sought to balance the relative weight of each variable and their interaction to create new models, obtaining mixed results. Some models demonstrate superior operating characteristics than the ASGE model11,12, while others display inferior characteristics13,22. For example, a study carried out in Colombia demonstrated a value of 0.87 in the receiver operating characteristics (ROC)7, consistent with results in its external validation12. Other studies conducted in Lithuania and Saudi Arabia report similar findings13,22. New proposals, especially in the last year (2021), have sought to improve certainty in diagnostic prediction. Chisholm et al introduced a model in 2020 based on multivariate analysis in a specialized referral center to reduce the number of false positives (patients undergoing non-therapeutic ERCP) and false negatives (patients undergoing cholecystectomy with choledocholithiasis)19. Good discrimination was obtained from approximately 80% of the population, with failed ERCP rates of 22% versus 42% for ASGE for the high-risk group and false-negative occurrence of 1.4% for the low-risk group. Twenty percent of the population in the intermediate group required additional diagnostic studies.
Building on the limitations above, new studies can focus on:
Objectively assessing the interaction of the variables within the ASGE model. Studies of this type can help determine, using statistical methods, the best cut-off points for each variable and their relative weight within the model. This analysis can inform the operating characteristics of the model based on original empirical studies.
Validating the available models in different populations. Given their growing use and wide acceptance in multiple practice contexts, there is a latent need to validate the ASGE and ESGE models.
Comparing the operating performance of the models in terms of their operating characteristics.
Determining the performance of the models in clinical situations that may affect the behavior of the variables, such as in cases of acute cholecystitis or advanced age, which have been shown to affect the parameters of biochemical markers or the accepted average diameter of the bile duct, respectively.
Criteria and mechanisms for determining risk thresholds in available models
The problem of adjusting the risk thresholds within the models is defining which patient falls into the high, medium, or low-risk groups. So, for instance, the stricter the high- or low-risk criteria, the fewer patients will be included in these groups, affecting the number of patients in the uncertain or medium risk zone. Therefore, the definition of thresholds can impact diagnostic and treatment resources. Good discrimination of risk thresholds should seek to reduce the number of patients taken to non-therapeutic ERCP, given the burden of morbidity of the procedure, and accurately distinguish patients who require additional diagnostic studies from those who can be safely taken to surgery. These models must have the ability to adapt to each clinical scenario, considering their available staff, experience, therapeutic resources, and cost-effectiveness.
Overall, the proposed prediction models show adequate global operating characteristics in the discrimination of risk groups. The ASGE model has shown a diagnostic sensitivity in the high-risk group of 90% to detect choledocholithiasis, while the low-risk group is close to 5%1,3,5,10. National validations have provided similar data23; however, the intermediate-risk groups remain a concern due to low diagnostic sensitivity, between 26% and 49.5%1,18,23. The number of patients in this group under the current risk distribution causes most patients to go through additional pre-surgical diagnostic studies, with unknown outcomes such as an immediate increase in care costs, delay in therapeutic management, and increased complications derived from a more extended hospital stay27.
The thresholds, therefore, affect the model’s overall functionality and the use of diagnostic tests, especially in the intermediate and low-risk groups. These aspects are essential when deciding on diagnostic methods, especially non-invasive ones. The latter, specifically EUS and NMRC, demonstrate a sensitivity of 97% and 87% and specificity of 90% and 92%, respectively28. Currently, they are widely used diagnostic tests, especially in the medium-risk group or in the uncertain zone, where invasive methods such as ERCP are not justified due to their associated risks (5%-10%)15). Clinical trials have shown that EUS and NMRC have a performance comparable to ERCP in diagnosing choledocholithiasis29,30. Even though EUS has a slightly higher sensitivity for diagnosis than NMRC 31, its applicability is limited by its low availability and operator-dependency28,32.
Intermediate risk can impact the performance of diagnostic precision methods such as EUS and NMRC. In high risk, a probability of a false positive that is not insignificant is assumed, which is reported on current models by up to 50%27, indicating an invasive procedure associated with risk such as ERCP with non-therapeutic purposes. Therefore, many schools today consider it relevant to take on the cost of confirming the high probability of choledocholithiasis determined by prediction models with diagnostic methods such as EUS in certain situations, including an increased risk of post-ERCP pancreatitis, pregnancy, or simply the need to zero the possibility of a non-therapeutic ERCP. What levels are we willing to assume as non-therapeutic ERCP? What percentage of patients undergoing surgery with choledocholithiasis is adequate? These questions persist and must be adjusted to each clinical context.
Building on the limitations above, new studies can focus on:
Adapting the risk thresholds according to the risk-benefit of diagnosing all patients to reduce the number of false positives in high risk and increase the percentage of patients who can be safely taken to surgery without additional diagnostic studies
Dynamizing the risk thresholds of the ASGE model and assessing its impact on clinical diagnosis in specific population groups, providing adjustment alternatives to their clinical contexts
The need for prospective clinical studies that validate the changes in the current structures and confirm and dynamize the models in their clinical settings becomes necessary in elaborating this topic and bridging knowledge gaps.
Applicability of generic prediction models to special populations
Another important point with limitations in the approach of current risk models has to do with their application to particular groups of patients: older adults (> 65 years), history of cholecystectomy, primary choledocholithiasis, biliary obstruction of malignant origin, and pancreatitis of biliary origin. Diagnostic prediction scales designed for patients with secondary choledocholithiasis have been routinely applied to these groups. However, the validation of the models available is limited in these particular groups. The adaptation of prediction models to groups for which they have not been developed or validated can affect their operating characteristics in terms of false negatives and positives, affecting the use of resources and diagnostic tests. For example, distinguishing biliary obstruction by malignancy versus benign disease is not straightforward. Multiple clinical and laboratory predictors are altered in both entities, even with invasive diagnostic elements such as ERCP and EUS. Differential risk determination is necessary, mainly identifying extra-hepatic cholestasis due to biliopancreatic malignancy versus benign biliary pathology33.
Patients over 65 years of age are also in this category. On the one hand, this group of patients, which is increasing due to the aging rate of the population, has unique features, such as a higher prevalence of cholelithiasis, an increased risk of complications, and pathophysiological changes in stone production34. These factors make it necessary to adjust the generic models to each group of patients to provide safe and effective treatment and reduce the risk of associated complications32,35. On the other hand, the information and its use in primary choledocholithiasis are limited. Some models in secondary choledocholithiasis have been adapted to the primary choledocholithiasis approach. Nonetheless, their yield is suboptimal, as these entities have a different pathophysiological pathway with a high recurrence rate (up to 41.7%)36.
Based on these limitations, new studies may focus on adapting and assessing the validity of predictive models for secondary choledocholithiasis in these particular groups or create new specific models from empirical studies in special populations.
Impact of prediction models on the cost-effectiveness of diagnosis and treatment
Regarding the cost-effectiveness of the diagnosis, the discussion about the impact of the models on the cost-effectiveness derived from non-invasive diagnostic tests (NMRC and EUS) and invasive tests (ERCP) is still valid. Cost-effectiveness analyses have shown that the routine use of NMRC or EUS in intermediate-risk patients reduces the care costs related to non-therapeutic ERCP and its complications37,38. The comparison about the usefulness of non-invasive tests suggests that EUS offers better diagnostic yield; nonetheless, when including the costs related to the procedure and comparing them with those of the NMRC, the latter appears to be more cost-effective37.
The approach to the cost-effectiveness of risk stratification is diverse. Kang et al demonstrated the ASGE’s model effectiveness in a population over 55 years of age, compared with the routine use of NMRC in all patients with suspected biliary obstruction, including malignant obstruction39. However, their results have been re-evaluated due to flaws in the model’s sensitivity analysis and use in different types of populations40. Other approaches to cost-effectiveness in local studies have revealed the suboptimal yield of the ASGE model compared to national models7,12. However, data are limited to the impact on the immediate cost for institutions and lack evaluations at the health system level.
Additionally, as a secondary objective of some clinical trials, the cost-effectiveness of endoscopic management in two stages (ERCP + delayed laparoscopic cholecystectomy) has been evaluated compared to operative management in a single stage (Rendezvous). The latter seems to be more cost-effective41. However, as noted in local studies, not all surgical teams have the necessary training and instruments to perform it10. Most studies have limitations regarding model adaptation based on different insurance systems and resource availability. This limitation has implications for timely care and the need to refer patients to institutions with a higher level of care for diagnostic studies (especially NMRC and EUS) and definitive treatment. New studies still need to discuss these aspects to determine the indirect costs derived from applying the care models to contexts with limited resources.
Grounded on the limitations identified in the available literature regarding the cost-effectiveness of prediction models to diagnose and treat secondary choledocholithiasis, the new studies may focus on evaluating:
The cost-effectiveness and economic impact of the models on health institutions and systems
Which model offers the best cost-effectiveness in different practice contexts
The adherence of professionals to models and its impact on cost-effectiveness
The clinical impact related to the long waiting times for non-invasive diagnostic tests and the need for referral, as well as the economic impact derived from these delays on the health system and the institution
These objectives should be achieved on significant population samples that allow extrapolating the cost-effectiveness analysis on the diagnosis, preferably from national databases or insurance systems, which can go hand in hand with the viability of clinical trials as secondary analysis objectives.
Conclusion
The prediction of choledocholithiasis remains controversial. Multiple aspects have revolutionized risk determination, such as the refinement of endoscopic techniques for diagnostic and therapeutic purposes and the development of non-invasive diagnostic methods such as EUS and NMRC. Nevertheless, it is imperative to identify the knowledge gaps in the scientific literature to improve the available prediction models. It is still necessary to assess the variables included in the prediction models, such as relative weight and interaction. The available models must be externally validated in multiple contexts and populations. Likewise, studies on reducing the diagnostic uncertainty related to determining risk thresholds and their application to special groups are required. Finally, delving into the models’ economic impact on the health system and institutions is needed.