fbpx Skip to main content


Machine Learning

By leveraging state-of-the-art machine learning techniques to recognize complex patterns in vast amounts of data, we can more effectively and efficiently develop innovative therapies for patients.

Machine Learning

Machine learning has been defined as the science of getting computers to act without being explicitly programmed, using algorithms and neural network models to assist computer systems to improve their performance. While this concept is not inherently new, dating back as far as the 1950s, modern iterations of this concept have since revolutionized industries across the board.

Healthcare, and pharmacology specifically, presents a unique set of problems in which machine learning principles are uniquely positioned to assist. The number of small molecules in existence is staggering and the amount of data currently being generated and readily accessible is such that we can fully embrace the power of machine learning and deep learning. Machine learning provides our scientists with the tools to review, annotate and identify novel molecules at a pace we never thought possible, and that is merely the beginning.

There is a unique opportunity to leverage diverse real-world data (RWD) and sophisticated Data Science algorithms to quicken the pace of clinical trials. During the development of a novel multivalent glycoconjugate vaccine to help prevent invasive extraintestinal pathogenic E. coli – or ExPEC – disease (IED), Janssen R&D Data Scientists used electronic health records (EHR) and machine learning principles to predict the risk factors for those subjects at higher risk for developing IED, with intention to reduce the total sample size required for the Phase 3 trials:

  • To validate and identify potential novel risk factors for IED, we built a predictive model of IED using an EHR database and a gradient boosting model to predict the occurrence of IED events based upon a subject’s prior medical history, past diagnoses, prescriptions, medical procedures and healthcare encounters, as well as age and gender.
  • The model independently identified the two risk factors used in the Phase 3 trial as being strongly predictive of IED. These risk factors have been fully implemented in the Phase 3 vaccine design trial to enrich high risk trial participants, resulting in a 50 percent decrease in trial sample size, with associated estimated cost savings, as well as potential to accelerate timelines for the trial itself.

We are also applying machine learning to patient-level data, including EHRs and medical images, to better understand and predict patients’ outcomes. In this way, we can better guide therapeutic intervention, intercept disease and understand novel biology to guide future drug development.