NLP effectively measures SDOH in EHRs, says Regenstrief report

Researchers from Regenstrief Institute and Indiana University developed three new natural language processing algorithms to extract housing, financial and employment data.

By Andrea Fox

July 13, 2023

10:47 AM

Photo: John Fedele/Getty

Without the need for deep learning and neural network models, researchers were able to use machine learning to extract social determinants of health information on housing challenges, financial stability and employment status from unstructured patient data in electronic health records, a new research report from Regenstrief Institute shows.

WHY IT MATTERS

Nonmedical factors that influence health outcomes, like a patient's occupation, health insurance coverage, marital status, size of household, address and frequency of address changes are SDOH that are often locked in unstructured data in electronic health records.

Often discussed at medical appointments, SDOH data is frequently recorded as text within clinical notes, according to new peer-reviewed research announced by the Regenstrief Institute.

"The challenge for healthcare organizations is effectively measuring and identifying patients with social risks so that they can intervene," said Joshua Vest, Regenstrief Institute Research Scientist and Indiana University's Fairbanks School of Public Health faculty member, in a statement.

The application of NLP on SDOH is one of the first studies of its kind, according to Vest.

The researchers obtained a training corpus of notes – including all clinical documentation – from two, diverse Indianapolis area health systems in the Indiana Network for Patient Care's multihospital health system.

They extracted 1,710,124 clinical notes for 581,205 unique patients created between January 1, 2019, and December 31, 2019 from a nonprofit health system serving largely privately insured individuals and 724,308 notes for 74,239 unique patients from a safety-net hospital with multiple health clinics that were documented between September 1, 2020, and March 31, 2021.

"We purposefully selected these different sources to support model generalizability," the researchers said in their report, published in JAMIA Open.

The notes were used in their raw form, and researchers applied patient identifiers to clinical and demographic information – such as age, gender, race, ethnicity, rural/urban status and rubrics like the Modified Townsend Index and Charlson Comorbidity score.

Researchers designed the AI model to run in the background, read all the notes and create tags or indicators that say this patient’s record contains data suggesting possible concern about a social indicator related to health.

"Our overall goal is to measure social determinants well enough for researchers to develop risk models and for clinicians and healthcare systems to be able to use these factors – housing challenges, financial security and employment status – in routine practice to help individuals and to provide a better understanding of the overall characteristics and needs of their patient population," Vest added.

The researchers discussed the choice of a state machine-based approach when "a majority of analytical efforts involving free-text datasets have shifted towards complex, resource-intensive approaches such as neural networks and deep learning to identify and classify various social factors."

They said while those models may yield superior performance, they are complex and require a high degree of technical expertise, significant computing resources and pose scalability issues – and bias.

"In any machine learning application, researchers and practitioners must make choices between the tradeoffs of performance, implementability and maintainability," they said.

First, finite-state machine methods may not be as sophisticated as neural network-based approaches, the researchers acknowledge as they note "several advantages" due to their simplicity.

"In many cases, rules-based systems are more transparent, easier to communicate to non-experts and therefore more easily implemented in other health systems," researchers said.

Therefore the NLP models may be more generalizable, resulting in "more consistent predictive performance across health systems," they added.

They also say their methods limit bias by utilizing human intervention in the development of state machines, and by leveraging a unique set of clinical notes.

"As clinical notes represent a different data generation process than coding or screening surveys, NLP could be applied as part of an overall social health measurement strategy," they said.

"It is important to not discard clinical text in favor of screening or other structured methods for data collection."

THE LARGER TREND

Previously, Regenstrief Institute researchers including Dr. Shaun Grannis, vice president for data and analytics, successfully demonstrated they could predict patients in need of a referral to a social service, such as a nutritionist, with an app they named Uppstroms.

The institute also worked with Indiana University on machine learning models before, such as training AI models with health information exchange data to forecast patient COVID-19 hospitalizations.

Bringing the "bread-and-butter data generated by healthcare systems together with public health decision-making" had been a challenge," Grannis said in a statement last year.

The researchers say these studies show how AI models can use clinical data with "considerable performance accuracy."

ON THE RECORD

"Our work helps advance the field in both application and methodology," Vest said in the statement.

"We demonstrated that a relatively simplistic natural language processing approach could effectively measure social determinants instead of using more sophisticated deep learning and neural network models. These later models are powerful but complex, difficult to implement, and require a lot of expertise, which many health systems don't have."

Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.

Topics:

Analytics, Artificial Intelligence, Electronic Health Records (EHR, EMR), Meaningful Use, Population Health