Reference+

Catacutan, D. B., Alexander, J., Arnold, A., & Stokes, J. M. (2024). Machine learning in preclinical drug discovery. Nature Chemical Biology, 20(8), 960–973. https://doi.org/10.1038/s41589-024-01679-1


Yellow: Interesting

Highlight ( Page )

These programs can take upward of 12 years and cost US $2.5 billion, with a failure rate of more than 90%.

Highlight ( Page )

Novel drug discovery and development can cost more than US $2.5 billion and can require 12–15 years of sustained effort1.

Highlight ( Page )

following biological target identification and validation, chemical inhibitors are typically (but not always) found through high-throughput screening (HTS) of chemical libraries. Hit compounds from these chemical screens are further investigated for MOA elucidation, structure–activity relationships and in vivo efficacy2,3. However, this traditional approach is associated with a roughly 90% failure rate for candidate compounds traversing from phase 1 clinical trials to market approval, with a much higher rate when accounting for the preclinical stages of discovery4.

Highlight ( Page )

clinical trial data suggest that the steep failure rate, even at later stages of development, can be attributed to a lack of commercial needs and poor strategic planning (approximately 10%), insufficient drug-like properties (approximately 15%), unacceptable toxicity (approximately 30%) and lack of clinical efficacy (approximately 50%)5,6.

Highlight ( Page )

drug-discovery pipeline, particularly at the preclinical stages where training data are currently most abundant

Highlight ( Page 2)

Virtual screening HTS, which tests large compound sets against target proteins or whole-cell cultures, has been a critical avenue for drug discovery. However, it is costly and time intensive, often taking weeks or months to screen 105 to 106 compounds and only finding a few with therapeutic potential.

Highlight ( Page 2)

• Step 1: hit identification. • Step 2: MOA elucidation. • Step 3: translational investigations.

Highlight ( Page 4)

protein–ligand docking can be leveraged to screen compounds based on their predicted binding affinity to a target protein.

Highlight ( Page 4)

As a concrete example, Deep Docking was used to screen 1.3 billion compounds in the ZINC15 database against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease

Highlight ( Page 4)

the chemical search space for bioactive molecules can be further expanded through generative deep learning approaches. For reference, the upper size limit of chemical libraries that can be reasonably virtually screened lies on the order of billions, a number that is dwarfed by drug-like chemical space, which is estimated to be upward of 1060 molecules33.

Purple: To learn more

Highlight ( Page 3)

Drug Repurposing Hub, a collection24 of around 7,000 compounds with favorable toxicity and pharmacokinetic profiles24. This led to the discoveries of halicin (E. coli minimal inhibitory concentration (MIC) ≈ 2 μg ml−1) and abaucin (A. baumannii MIC ≈ 2 μg ml−1), respectively