Using Machine Learning Models to Improve Clinicians’ Assessments of Suicide Risk

December 2, 2025

Ruta Nonacs, MD PhD

Machine learning models using EHR data significantly improved clinicians’ suicide risk assessments, offering more accurate predictions of future suicide attempts.

More than half of those who die by suicide see a health care professional during the month prior to their death; thus clinicians play an important role in identifying individuals at risk for suicide. Assessing suicide risk is considered to be one of the most essential skills for mental health providers; yet, there has been ongoing debate as to whether clinicians’ assessments of suicide risk are sufficiently accurate to be of clinical utility. In a recent study, researchers from the Center for Precision Psychiatry at Mass General Hospital, led by Kate Bentley, PhD, Chris Kennedy, PhD, and Taylor Burke, PhD used data from the electronic health record (EHR) to examine the predictive value of clinician-conducted suicide risk assessments. The researchers also examined whether machine learning models incorporating a broader range of information routinely collected during these risk assessments could be incorporated to further enhance the accuracy of predicting risk for future suicide attempt.

Study Design

This multisite electronic health record–based, prognostic study included 89,957 patients (5 years of age or older) who were evaluated for suicide risk using a structured suicide risk assessment based on the Suicide Assessment Five-Step Evaluation and Triage or SAFE-T framework. The SAFE-T framework, developed by the Substance Abuse and Mental Health Services Administration (SAMHSA), is a clinical protocol for assessing an individual’s risk of suicide and has been incorporated into routine behavioral health visits documented in the EHR at hospitals in the Mass General Brigham (MGB) healthcare system. Risk and protective factors are identified, and a suicide inquiry is conducted, asking specific questions about suicidal thoughts, plans, behaviors, and intent. Based on this inquiry, the clinician labels risk as minimal, low, moderate, or high.

Assessments were documented by 2577 clinicians during outpatient, inpatient, and emergency department encounters at 12 hospitals in the MGB healthcare system (between July 2019 and February 2023). Of the 812 ,114 participants with suicide risk assessments documented in the EHR, 58.8% were female. In this cohort, 3.3% were Asian, 5.3% were Black, 3.0% were Hispanic, 77.4% were White, and 11.0% were of Other or Unknown race.

The researchers identified emergency department visits documenting an ICD-10 code for suicide attempt in the electronic health record within 90 days or 180 days of the index suicide risk assessment. Suicide attempt rates varied according to where the suicide risk assessment was conducted:

Outpatient: 0.12% attempted suicide within 90 days and 0.22% within 180 days;

Inpatient: 0.79% within 90 days and 1.29% within 180 days;

Emergency Department: 2.40% within 90 days and 3.70% within 180 days

Predictive Accuracy of Clinicians’ Assessments

The predictive performance of clinicians’ suicide risk assessments was evaluated using clinicians’ overall risk estimates from a single suicide risk assessment item indicating minimal, low, moderate, or high risk. An area under the curve (AUC) was calculated for suicide risk assessments conducted in the three different settings. The AUC was used to measure how well a clinicians’ risk estimate is able to separate patients who will make a suicide attempt from patients who will not make an attempt. Conceptually, AUC represents the probability that a randomly chosen patient who attempts suicide will have been judged at higher risk than a randomly chosen patient who does not. When the AUC is 1.0, the clinician’s assessment is never wrong; when the AUC is 0.5, the clinician’s assessment is no better than flipping a coin.

The predictive value of clinicians’ overall single-item risk estimates varied according to the setting where the first suicide risk assessment was performed:

Outpatient: AUC value of 0.77 (95% CI, 0.72-0.81) for 90-day suicide attempt prediction;

Inpatient: AUC value of 0.64 (95% CI, 0.59-0.69) for 90-day prediction;

Emergency Department: AUC value of 0.60 (95% CI, 0.55-0.64) for 90-day prediction

When assessments occurred in the setting of an outpatient visit, clinicians’ ratings showed moderate predictive accuracy (AUC = 0.77). However, in the emergency departments, predictions were only slightly better than chance (AUC = 0.60). The predictive performance of clinicians’ assessments were similar when extending follow-up to 180 days.

Can Machine Learning Models Be Used to Improve Clinicians’ Accuracy?

After estimating the accuracy of clinicians’ suicide risk assessments, the resarch team asked whether machine learning models could incorporate the information gathered by the clinician as part of the suicide risk assessment (87 items) in order to improve the accuracy of suicide risk predictions. The best-performing machine learning models significantly increased the accuracy of estimates for 90-day risk predictions:

Outpatient: AUC value of 0.87 (95% CI, 0.83-0.90) for 90-day suicide attempt prediction;

Inpatient: AUC value of 0.79 (95% CI, 0.74-0.84) for 90-day prediction;

Emergency Department: AUC value of 0.76 (95% CI, 0.72-0.80) for 90-day prediction

The performance was similar for 180-day suicide risk predictions.

Next Steps

The current study observed that clinicians’ single-item ratings of suicide risk performed better than chance in predicting which patients would attempt suicide within 90 or 180 days. However, the accuracy of these assessments varied significantly by setting. When assessments occurred in the setting of an outpatient visit, clinicians’ ratings showed moderate predictive accuracy (AUC = 0.77). In inpatient settings, accuracy dropped (AUC = 0.64), and when risk assessments were performed in the emergency departments, predictions were only slightly better than chance (AUC = 0.60).

The predictive accuracy of suicide risk assessments improved significantly by statistically incorporating information about recent suicidal thoughts and behaviors and other factors routinely collected by the clinician during the assessment. Machine learning models utilizing all clinician-documented suicide risk assessment items (87 predictors) significantly improved 90-day and 180-day risk prediction, with the AUC increasing to 0.87 for outpatient, 0.79 for inpatient, and 0.76 for emergency department visits.

The authors suggest that models using data from clinician suicide risk assessments could be integrated into the electronic health record, ultimately providing the clinical with real-time estimations of risk immediately after conducting a suicide risk assessment. In addition, clinicians could receive alerts embedded within the EHR with suggested interventions based on risk estimates. Future research is needed to compare machine learning models using clinician SRA data alone to models combining SRA data with other clinical information in the EHR. Combining clinician assessments with powerful EHR-based suicide risk algorithms has the potential to improve suicide risk prediction, facilitating more targeted and timely intervention.

Using Machine Learning Models to Improve Clinicians’ Assessments of Suicide Risk

Study Design

Predictive Accuracy of Clinicians’ Assessments

Can Machine Learning Models Be Used to Improve Clinicians’ Accuracy?

Next Steps

Read More

Improving Both Mood and Anxiety: What Cariprazine Brings to the Treatment of Major Depressive Disorder

Pediatric Catatonia: Improving Recognition and Treatment Outcomes

Members of MGH Department of Psychiatry Among the Most Highly Cited and Influential Researchers

Sign Up for Our Newsletter

Success!

Massachusetts General Hospital