1.Introduction
Rules-based triage protocols (RBTP) for live nurse triage, also known as branching tree protocols, are widely used in various healthcare settings in the United States, including office practices, ambulatory clinics, health systems/hospitals, and managed care call centers. Schmitt-Thompson protocols, the industry gold standard, are essentially a telephone triage version of a medical checklist or clinical decision support tool. Schmitt-Thompson authors, licenses and supports evidence-based telehealth triage guidelines, a decision support tool for live telephone care providers such as triage nurses. The protocols include assessments of symptoms for both pediatric and adult patients. Based on the most prevalent or worrisome presenting symptom, a triage nurse or clinician selects the correct protocol. The protocols provide nurses and other clinicians a reassurance that they are following an established process and providing the best possible outcome for the patient-caller, and they assist the nurse in efficiently progressing through the needed data collection, triage, disposition selection and patient advisory processes. Components of the RBTP process include assessment, diagnosis, outcomes/planning, implementation and evaluation. Schmitt-Thompson helps healthcare systems/facilities, providers, and call centers deploy its market-leading triage protocols solution for live nurse telephonic triage, and provides triage protocol content to health plans, software developers and patient engagement platforms as well as care delivery organizations.
Prior studies have demonstrated that telephonic telehealth triage frequently triages higher acuity patients to appropriately receive emergency department (ED) level care.[1] Patient satisfaction with live telehealth call centers is generally high and in one analysis, most patients who reported any effect on their relationship with their primary provider assessed it as positive.[1] Compliance with recommendations for urgent evaluation or home care was relatively high in one study, except for intermediary dispositions which was low.[2] In a large analysis of the performance of 23 symptom checkers, those that used Schmitt-Thompson RTBPs provided more appropriate triage decisions than those not.[3] Accuracy of triage guidance, however, differed according to the operator of the symptom checker, with provider groups and physician associations performing at the highest level, followed by private companies and then health plans or government agencies.[3] Bartenschlager et al. recently evaluated how analytics and artificial intelligence (AI)-based extensions improved the performance metrics of human designed RBTPs compared to that of the existing or baseline protocol. The performance of the AI-enhanced triage algorithm was superior, and improved the performance of the baseline human protocol significantly when integrated with AI-based algorithms.[4] Entezarjou et al. evaluated human versus automated machine learning-based triage performance in primary care triage utilizing a digitalized patient history, and found that low interrater and intrarater agreement in triage decisions among primary care providers limits the possibility of using human decisions as a reference for machine learning (ML) to automate triage in primary care.[5] In a review of 18 studies comparing diagnosis by an AI/ML protocol with human diagnosis, the AI/ML protocol improved the accuracy of human diagnosis, particularly when the clinician was less experienced.[6] Furthermore, none of the studies reported that an AI/ML protocol performed as poorly as human triage.[6]
AI-based virtual triage (VT), also known as automated symptom checkers, are a digital technology that are accessible to patient-users 24/7/365 from any device connected to the internet. AI-based VT helps patient-users evaluate their symptoms and determine the acuity and kind of care needed. AI-based VT conducts a medical query process, asking questions about symptoms experienced and collects demographics such as gender, age, risk factors and medical history. AI-based VT classification algorithms, and an inference engine that deploys AI, consider the most probable conditions, including any particularly acute and alarming symptoms or concerning risk factors. From these responses, a statistical probabilistic model and inference algorithm for symptom assessment uses a detailed medical knowledge base to compute probabilities of the most likely conditions, and selects the most pertinent questions to next ask the patient based on prior responses. Responses are analyzed rapidly on a current basis. AI-based VT then conveys an evaluation of the reported symptoms, and identifies probable causes and severity, plus an appropriate acuity or level of medical care for the patient-user to pursue. The study reported here evaluated the comparative performance of RBTP versus AI-based VT with respect to triage care referral accuracy.
2.Methods
2.1Objective
The objective of this analysis was to compare the triage accuracy of an AI-based VT internet-based application to industry standard RBTPs for live triage produced by Schmitt-Thompson Clinical Content.
2.2Schmitt-Thompson live rules-based triage protocols
Schmitt-Thompson has been a leader in live telephone triage care for over 30 years with rigorously reviewed nurse RBTPs or clinical decision support guidelines.[7] The guidelines are divided into two primary product sets: “After Hours” protocols, utilized by 95% of after-hours and managed-care call centers in North America and covering a wide range of 397 adult and 348 pediatric protocols designed to support after-hours and 24/7 call centers; and “Office Hours“ protocols, employed by over 10,000 practices and clinics, featuring 234 adult and 250 pediatric protocols in a more condensed format.[7] These guidelines encompass symptom definitions, initial and triage assessment questions, targeted care advice, home care guidance, background information and first aid instructions. The After Hours protocols were used in this analysis as AI-based VT is similarly accessible 24/7. When using RBTP, after selecting the most relevant protocol/topic based on the patient’s key complaints, symptoms and age, a healthcare professional, often a triage nurse, continues the assessment and triage following a decision-tree structure to reach a triage disposition or outcome.
2.3AI-based automated virtual triage technology
The Infermedica Symptomate AI-based VT engine is designed for general public use and completes evidence-driven automated patient-user interviews and analyses informed by over 800 diseases, 1,700 symptoms, and 300 risk factors. Leveraging AI, ML and natural language processing, AI-based VT evaluates symptoms reported by patient-users, suggesting the most probable conditions matching the presentation and history, and refers to the most clinically appropriate and safest possible care. There are no prescribed interview pathways, and given new information, the VT AI explores various clinical queries and hypotheses (as physicians do).
Prior to AI-based VT, patient-users are asked about their care intention. Interaction with the technology begins with a question to specify the patient’s gender and age as well as elements of past medical history, followed by a prompt to list symptoms and complaints the user is experiencing. Subsequently, the AI generates a list of yes/no, single choice and multiple choice questions, and after reaching a confidence threshold, the interview concludes. AI-based VT evaluates symptoms reported by patient-users, suggesting the most probable conditions matching the presentation and history, and refers to the most clinically appropriate and safest possible care. The AI-based VT interview concludes with an analysis of the reported symptoms and guidance to engage an appropriate level of care acuity: proceed to an ED, or call an ambulance for ED transport, consult a primary care or specialist physician on an outpatient basis, consult the latter within 24 hours, or home-based or self-care. Symptomate is a stand alone AI-based VT engine not integrated with/implemented within a health system, and available on Infermedica.com or as a mobile application. Symptomate is available in 24 languages. Over 15 million Symptomate evaluations have been completed since 2012.[8]
In Europe, virtual triage technologies are considered medical device class I according to Medical Device Directive (93/42/EEC), and fall under the Food, Drug & Cosmetic Act in the US. The Food & Drug Administration (FDA) currently exercises enforcement discretion, which means the technology is not required to comply with FDA regulations related to medical devices.
2.4Patient vignettes and results mapping
Triage performance using 149 Schmitt-Thompson clinical vignettes was evaluated against a widely utilized AI-based VT solution, Symptomate. A set of 45 vignettes from Semigran et al. was supplemented with 105 patient cases derived from Case Files in Emergency Medicine, 100 Cases in Clinical Medicine, and Case Files: Family Medicine and adapted by a team of three physicians from Western University in London, Ontario.[3, 9,10,11] All vignettes included a list of patient complaints, expected condition, and expected triage and urgency assessment. As both modalities offer different categories of triage urgency assessment, their triage levels were mapped to a standard of three expected triage categories, namely urgent care needed, non-emergent care needed and self-care appropriate (see Table 1).
| Call EMS 911 immediately | Emergent care needed | Call an ambulance |
| Go to ED immediately | Go to an ED | |
| Go to ED/UCC immediately (or to outpatient care with PCP approval) |
||
| Seek outpatient care immediately | Non-emergent care needed | Consult a physician within 24 hours |
| Seek outpatient care today | Consult a physician | |
| Seek outpatient care today or tomorrow | ||
| Seek outpatient care within 3 days | ||
| Seek outpatient care within 2 weeks | ||
| Home/self-care | Self-care appropriate | Stay at home, observe symptoms |
Of the 149 patient vignettes used to evaluate both triage modalities, 88% described adult patient clinical presentations. The low number of pediatric vignettes evaluated was dictated by the limited number available to draw from. In total, 59 vignettes (39.6%) required emergency care, 69 (46.3%) required non-emergent care, and 21 (14.1%) required self-care (see Table 2).
| Children (< 18 years old) |
4 (22.2%) |
7 (38.9%) |
7 (38.9%) |
18 (12.1%) |
| Adults (18+ years old) |
55 (42.0%) |
62 (47.3%) |
14 (10.7%) |
131 (87.9%) |
| Acuity Level Totals (Percent) |
59 (39.6%) |
69 (46.3%) |
21 (14.1%) |
149 (100%) |
2.5Patient scenarios testing
To initiate the clinical vignette using virtual triage, a physician used key presenting complaints and demographics. In the interactive processing component of the VT interaction, a question response was confirmed only if the finding was directly stated in the vignette description, otherwise the evidence was reported as absent. Each of the 149 vignettes was evaluated initially using AI-based VT and RBTP triage modalities by two physicians, and subsequently a third and a fourth physician independently assessed for errors and inconsistencies; if any were found, the triage interview would be restarted and completed again. Based on the chief initial complaints and age, a physician selected the most clinically appropriate triage pathway and continued advancing through the decision tree until a triage assessment was completed. Subsequently, a different physician validated that the right protocol had been used. If unable to reach a consensus, a third physician independently evaluated the vignette to make a final determination, and the interaction was restarted.
2.6Calculation of triage performance accuracy metrics
Results were uploaded to a database and triage assessment precision was analyzed by matching the expected triage assessment, and sensitivity and F1 score (harmonic mean of the precision and recall) were calculated and recorded according to the formulae shown in Table 3. We also compared the number of clinical data elements typically needed and gathered by each modality in order to complete vignettes (including symptoms, risk factors, and past medical history, excluding age and gender).
| Precision | |
| Sensitivity | |
| F1 Score | |
3.Results
3.1Triage performance accuracy
Both modalities achieved > 70% triage accuracy (see Table 4) applying the calculations shown in Table 3, and their safety performance was identical at 91% (defined as a triage assessment that is not lower than the expected triage urgency level). AI-based VT was more accurate in care referral for emergency and non-emergency care cases. AI-based VT overtriaged to emergencies 50% less frequently than RBTP, but this difference did not attain statistical significance due to inadequate power/small sample size. RBTP more accurately detected when a patient’s clinical presentation warranted self-care. This difference was not, however, statistically significant.
| Emergency care required | RBTP | 47 (79.7%) | - | 12 (20.3%) |
| AI-based VT | 50 (84.7%) | - | 9 (15.3%) | |
| Non-emergent care required | RBTP | 49 (71.0%) | 18 (26.1%) | 2 (2.9%) |
| AI-based VT | 56 (81.2%) | 8 (11.6%) | 5 (7.2%) | |
| Self-care appropriate | RBTP | 11 (52.4%) | 10 (47.6%) | - |
| AI-based VT | 8 (38.1%) | 13 (61.9%) | - | |
| Total | RBTP | 107 (71.8%) | 28 (18.8%) | 14 (9.4%) |
| AI-based VT | 114 (76.5%) | 21 (14.1%) | 14 (9.4%) |
Table 5 conveys the statistical metrics calculated for each triage modality by acuity level. AI-based VT attained higher precision, sensitivity and F1 scores in emergency acuity level vignettes, but less so in self-care vignettes compared to RBTP. Both modalities demonstrated decreased sensitivity as care urgency/acuity decreased, which was more pronounced in AI-based VT (84.7% vs. 38.1%) than in live triage using RBTP (81.0% vs. 52.4%).
| AI-based VT call for emergency care (ambulance transport) | 15 of 16 | 93.8% | 84.8% | 89.0% |
| AI-based VT seek emergency care | 35 of 42 | 83.8% | 84.8% | 84.0% |
| RBTP call EMS 911 immediately | 13 of 15 | 86.7% | 81.0% | 83.8% |
| RBTP seek ED care immediately | 19 of 29 | 65.5% | 81.0% | 72.5% |
| RBTP seek ED/UCC care immediately (or outpatient care with PCP approval) | 15 of 21 | 71.4% | 81.0% | 75.9% |
| AI-based VT outpatient consultation within 24 hours | 29 of 41 | 70.7% | 81.2% | 75.6% |
| AI-based VT outpatient consultation (timeframe unspecified) | 27 of 37 | 73.0% | 81.2% | 76.9% |
| RBTP seek outpatient care: | ||||
|
|
9 of 17 | 52.9% | 71.0% | 60.7% |
|
|
5 of 9 | 55.6% | 71.0% | 62.3% |
|
|
9 of 11 | 81.8% | 71.0% | 76.0% |
|
|
13 of 16 | 81.2% | 71.0% | 75.8% |
|
|
13 of 18 | 72.2% | 71.0% | 71.6% |
| AI-based VT self-care | 8 of 13 | 61.5% | 38.1% | 47.1% |
| RBTP home/self-care | 11 of 13 | 84.6% | 52.4% | 64.7% |
3.2Vignette evidence gathered
The two modalities diverge in terms of the number of data elements typically collected in order to complete the evaluation process for the vignettes in each modality and arrive at a final disposition (see Table 6). On average, AI-based VT collects greater than four times as much evidence from the vignette (9.9 vs. 2.1). With RBTP the patient typically initiates triage by reporting only a single symptom, which is then responded to by the live triage nurse. In contrast, AI-based VT allows the patient to articulate any number of symptoms, with all of the reported information processed through its AI to yield one or more likely diagnoses and appropriate acuity care referrals.
| Number of symptoms presented in clinical vignette | 6.0 | - | 14.8 | - |
| Mean number of required data fields per vignette to initiate and complete evaluation using rules-based triage protocols | 1.2 | 19.8% | 2.1 | 14.0% |
| Mean number of required data fields per vignette to initiate and complete virtual triage evaluation and care referral | 4.7 | 77.9% | 9.9 | 67.2% |
4.Discussion
Both triage modalities demonstrated comparable results despite using very different technologies. In this study AI-based VT more accurately referred in emergency and non-emergency care cases, and over-triaged/referred to the ED 50% less often than RBTP (not statistically significant). AI-based VT performed referral to self-care less favorably. Because AI-based VT is fully automated, and operates without a human to verify the result live, AI-based VT sacrifices specificity in self-care recommendations to ensure safety levels comparable to live telephonic triage, which utilizes evidence-based protocols combined with the clinical judgment of a triage clinician conducting the interview.
Having the ability to override or adapt a decision-tree live interview by a human intermediary is an advantage of telephone triage, enabling complaints reported by patients to be assessed accurately, for example, if shortness of breath reported by the patient meets a clinical definition of dyspnea. On the other hand, it remains a subjective decision of a healthcare professional to select the most appropriate protocol in order to make an informed decision on triage disposition. In contrast, AI-based VT engines are deterministic in that given the same information input (symptoms, past medical history and demographic information), the same version of AI-based VT software will return the same triage disposition. Unlike live triage, a limit of AI-based VT performance is not the extent of training and experience with the technology that a clinician has. Rather it is the ability of AI-based VT to evaluate patient complaints and generate the queries needed to make a clinical determination and care referral, using language that is understandable and actionable for patients. RBTP, however, is more human resource demanding, and is inherently limited by the availability of nurses to use RBTP, including hiring, scheduling, paying salaries, and ensuring adequate service coverage outside of office hours.
Although the differences in triage accuracy between the two modalities are not statistically significant, each uses notably different amounts of clinical/reported symptomatic evidence to determine an appropriate acuity level. AI-based VT also gathers considerably more information in order to present a clinical disposition than a clinician does when following a live telephonic decision tree protocol. The collection of more patient and clinical presentation data/information creates additional value, for example capturing information that enables a more personalized digital care journey for the patient, and the ability to prepopulate clinical visit notes to save physicians and nurses time in their workflows. Such captured data in the aggregate can also be leveraged to yield important population health insights for healthcare delivery organizations.
Clearly, Schmitt-Thompson’s extensive experience in live triage has enabled validation of the most critical symptoms and risk factors that need to be gathered during a patient encounter to optimize accuracy and safety. Nonetheless, live triage productivity is limited by call center throughput capacity, number of triage clinicians and average duration of a patient interview. A live triage nurse conducting telephone triage has access to information beyond evidence gathered in a structured or automated AI-based VT interview, such as patient tone of voice, respiratory rate and so on, that may also influence a final clinical determination, details of which are unavailable for AI-based VT, which is based on a simple question-answer sequence model.
A strength of this analysis is that the clinical vignettes were prepared by independent physicians at Western University in Ontario, and were not developed by either modality’s organization or clinicians. Furthermore, while the number of vignettes tested was modest (149), it was greater than the set of 45-50 (or less) vignettes frequently used in prior evaluations of AI-based symptom checkers.[10,11,12] A study limitation is that the vignettes utilized clinical cases designed for care episodes that necessitated evaluation and treatment by a live clinician, not episodes where self-care is appropriate.[7, 9,10,11] Thus an increase in the number of self-care vignettes evaluated could improve AI-based VT performance. On the other hand, unlike in AI-based VT, the choice of an appropriate clinical protocol when using RBTP is a decision made by a triage clinician, thus the accuracy of RBTP selection and thus performance may improve with clinician experience.
5.Conclusions
This analysis shows that both tools, AI-based VT and RBTP, are comparable in terms of triage accuracy and disposition safety, despite substantial differences in technology and methodology deployed. Health systems and payor organizations seeking to advance their current pre-visit procedures should assess differential benefits each conveys to determine the suitability of a fully automated and predictable AI-based VT modality versus a live clinical triage capability. Healthcare delivery and payor organizations seeking advanced, streamlined triage solutions to avert avoidable downstream care acuity and associated costs should assess each modality’s benefit. While AI-based VT can provide accurate, safe triage recommendations (typically at a lower cost), care delivery and payor organizations should assess how AI-based VT compares to implementing and sustaining a live clinical triage capability with respect to organizational priorities, budgetary considerations, characteristics of the patient/member population served, and the existing technological environment.
Authors contributions
KK, NJ, JJ, TP and GAG were involved in the study design, data analysis and presentation, and all participated in written manuscript preparation. PMO provided oversight and direction to the research and writing team.
Ethical statement
This analysis was not based on an experimental design utilizing human subjects and none were involved in completing this study. No formal ethical committee review was needed or pursued.
Funding
This work had no external financial support.
Conflicts of Interest Disclosure
All authors are either medical advisors to or employees of Infermedica.
Ethics approval
The Publication Ethics Committee of the Sciedu Press. The journal’s policies adhere to the Core Practices established by the Committee on Publication Ethics (COPE).
Provenance and peer review
Not commissioned; externally double-blind peer reviewed.
Data availability statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Data sharing statement
No additional data are available.
Acknowledgements
The authors are grateful to Forson Chan, MD; Amanda Singh, MD; and Jocelyn Peters, MD at the Schulich School of Medicine & Dentistry at Western University in London, Ontario for their assistance.
References
- Schmaus A, Cooper I, Whitten T. Impact of health link utilization on emergency department visits. CJEM. 2023;25(5):429-433. doi:10.1007/s43678-023-00504-3
- Kempe A, Luberti A, Hertz A. Delivery of pediatric after-hours care by call centers: A multicenter study of parental perceptions and compliance. Pediatrics. 2001;108(6):E111. doi:10.1542/peds.108.6.e111
- Semigran H, Linder J, Gidengil C. Evaluation of symptom checkers for self-diagnosis and triage: Audit study. BMJ. 2015:351. doi:10.1136/bmj.h3480
- Bartenschlager C, Grieger M, Erber J. COVID-19 triage in the emergency department 2.0: How analytics and AI transform a human-made algorithm for the prediction of clinical pathways. Health Care Manag Sci. 2023;26(3):412-429. doi:10.1007/s10729-023-09647-2
- Entezarjou A, Bonamy A, Benjaminsson S. Human versus machine learning-based triage using digitalized patient histories in primary care: Comparative study. JMIR Med Inform. 2020;8(9):e18930. doi:10.2196/18930
- Dang A, Dang D, Vallish B. Extent of use of artificial intelligence & machine learning protocols in cancer diagnosis: A scoping review. Indian J Med Res. 2023;157(1):11-22. doi:10.4103/ijmr.IJMR_555_20
- Schmitt-Thompson Clinical Content. The Guidelines - Schmitt-Thompson Clinical Content. https://www.stcc-triage.com/the-guidelines
- Gellert G, Orzechowski P, Price T. A multinational survey of patient utilization of and value conveyed through virtual symptom triage and healthcare referral. Frontiers in Public Health. 2023. doi:10.3389/fpubh.2022.1047291
- Case Files: Emergency Medicine. September 15, 2012. ISBN: 0071768548
- 100 Cases in Clinical Medicine. March 22, 2007. ISBN: 0340926597
- Case Files: Family Medicine. March 10, 2016. ISBN: 1259587703
- Hill M, Sim M, Mills B. The quality of diagnosis and triage advice provided by free online symptom checkers and apps in Australia. Medical Journal of Australia. 2020:514-19. doi:10.5694/mja2.50600

