How to find a drug: the past, present and future of small molecule drug discovery

Drugbaron Blog

July 19, 2022 no comments

How to find a drug: the past, present and future of small molecule drug discovery

Despite the current hype around so called “advanced therapies”, which range from gene editing to cell therapies, and the inexorable advance of biologic therapeutics such as monoclonal antibodies, even in 2022 the majority of drugs in development and reaching patients are still small organic molecules.

From enzyme inhibitors to receptor antagonists, allosteric modulators to suicide substrates, small molecules can modulate protein function in uniquely diverse ways. But finding the right molecule has never been easy.

For centuries medicines were found using phenotypic screens: chewing willow bark had analgesic properties and fractionation of the willow yield the active ingredient, salicylic acid, which became aspirin. Its an effective search strategy because you are using a direct measure of the outcome you want to achieve to guide the search. But its throughput is very limited, since most phenotypic assays are highly burdensome.

Low throughput is an acceptable characteristic of search when you already have a good starting point (which is why it was able to guide the isolation of an active ingredient from a mixture, for example). What happens, though, if you don’t have any kind of starting point to hand?

Starting in the middle of the 20^th Century, molecular science began to provide the answer. Looking for molecules that bind to a particular molecular target is much quicker and easier than assessing function on a whole organism or even cultured cell type. And through to the end of the 20^th Century our industry got progressively quicker and quicker as high-throughput screening (HTS) became the go-to technology for finding “hits” (the first molecule with a particular target binding profile).

HTS yielded the chemical starting points for many of today’s established medicines, from antacids to blood pressure controlling drugs. But it has its limitations too. While the throughput might be a million-times higher than a phenotypic screen, it is still limited to testing a tiny fraction of all available compounds (its practicable to screen millions of compounds through a typical HTS assay but not billions or more) – and of course you are limited to compounds that you have a physical supply of. Constructing larger and more diverse screening libraries was, for a while, the competitive advantage of the largest pharma companies.

Around the turn of the century, increases in computing capabilities opened up an opportunity to expand the search space by several more orders of magnitude, with the arrival of in silico screening. Rather than physically test a library of compounds for binding to a target, you could now model the interaction and estimate the binding affinity. And you could do it to millions of compounds with a fraction of the effort required to set up an HTS. Yes, it was imperfect (as the statistician George Box famously said “all models are wrong, but some are useful”), but by selecting a much more limited subset of compounds to subsequently test in a physical binding assay, in silico screening became a useful “front end” to the a screening hierarchy: test many millions of molecules in silico, a few hundred in a binding assay, a few tens in a functional assay and you quickly have one (or if you are lucky more than one) excellent starting point for medicinal chemistry.

The evolution of small molecule discovery did not stop there, however. In its first guise, in silico screening relied on libraries of already known (and ideally already existing) molecules. After the virtual screen was complete, the interesting subset of compounds would be ordered in for physical testing. But limiting to known compounds is actually a huge limitation. Its hard to estimate how many stable small organic molecules there are with a molecular weight less than 500, but it is likely many more than you imagine – perhaps as many as 10^60 (which is close to the number of atoms in the known universe). But whatever the number is, it is so much larger than the number of compounds that have ever been synthesised in a chemistry laboratory (which is over the order of 10^12) that the limitation of virtual screening instantly becomes clear: we are hardly scratching the surface of the small molecule diversity.

Of course, if a molecule with the right properties lies within the known search space, all will be well and good. But very often the best hits do not offer much hope for successful candidate generation through optimisation of the medicinal chemistry. And if the desired binding profile is at all rare or complex, you may very well get nothing useful at all.

The solution came in the form of so-called “chemical spaces”. These are virtual libraries constructed by concatenating simple reagents with simple reaction schemes in combinatorial fashion – but unlike a classical combinatorial library used in HTS, you don’t actually make all the compounds. They only “exist” in silico, but you know – at least in principle – how to make any of them easily should you need to. This allows the chemical space to be much larger than any practical physical compound library – early examples, such as the EnamineREAL library reached 10^14 or so “potential” compounds, and the largest described to date, the Merck MASSIV library as many as 10^20. That is still a tiny fraction of all possible small molecules but many orders of magnitude bigger than restricting to previously known compounds.

Chemical spaces can also increase diversity – previously known compounds tend to be clustered, reflecting both focus on known synthesis routes but also on similarity to “interesting” compounds (because there had to be a reason why anyone had ever made them). By demanding synthetic tractability, chemical spaces are still not random with respect to overall structural diversity, but at least they avoid the bias of similarity to compounds with previously known properties.

But as chemical spaces grew into uncharted territory, a new problem emerged: in silico screen for binding is computationally expensive, and while screening billions of compounds is practicable, the largest chemical spaces are billions of times larger than that. It is not, today, even close to practicable to screen a chemical space using classical in silico screening. So screening chemical spaces is limited to computationally simple (“2D”) assessments for similarly to known hits. This can, and has, been very useful at finding new chemical matter for previously studied targets which in turn can give access to interesting new binding profiles as well as superior ADME or other drug-like properties.

But the holy grail is to combine the size and diversity of chemical spaces with the searching power of HTS or in silico virtual screens of smaller libraries. Two solutions, one physical and one computational have emerged.

The first is the growing popularity of DNA-encoded libraries (DELs). Here, libraries are constructed on DNA tags in a variety of different ways, but in essence in a broadly similar way to the creation of a virtual chemical space. But because the molecules are tagged, it allows screening of much larger libraries than could be screened with conventional HTS, and importantly allows screening for binding to proteins for which no simple binding assay previously existed. This has opened up the possibility of finding hits against so-called “undruggable” proteins.

There is, however, always a “however”… The need to synthesise the compounds with oligonucleotide tags is a big restriction on the type of compounds that populate a DEL – with a propensity to rather larger, linear molecules than typical drugs, and the tag used for identification can itself interfere with the assay. In the end, history is likely to judge DELs (at least as currently constructed) as more useful in theory than in practice.

The second is the new kid on the block – adaptive chemical spaces. This was an idea first described by Sadybekov and colleagues in Nature as recently as December 2021 (Nature 601, 452-459 (2022), which they called V-SYNTHES. In principle, its very simple: instead of creating a single, static chemical space that you search for all targets, you create a bespoke chemical space optimised for the chosen target. Using in silico modelling of binding (exactly as for virtual screens), they docked a range of “virtual synthons” (small chemical building blocks) into the target site and selected those that bound the best, and then “reacted” them in silico with a wide of reagents to create new diversity which was again screened for binding to the target. This was then repeated for a third cycle. In effect they screened a chemical space comprising billions of molecules using computationally-intensive modelling because their search strategy focused on only a smaller subset of the whole space (those derived from the best binders in the previous round).

The results were impressive and yielded novel cannabinoid receptor ligands that were experimentally verified. But Jon Heal, and his team at ProsaRx (now part of RxCelerate Group) have gone a step further and leveraged machine-learning to iteratively optimise binding during construction of an adaptive chemical space – essentially the V-SYNTHES approach on steroids. The overall workflow starts with a structure of the chosen protein target (or a structural model where no experimental structure exists) and docks a range of “virtual synthons”, the best of which are selected for in silico reactions to generate larger candidate ligands which are also docked and ranked for potential binding. At this point, the algorithm can go forwards and repeat the cycle or backwards and replace one of the earlier component parts. Both the selection of the components and the reactions between them are therefore iteratively optimised in parallel, maximising the chemical space covered for a given computational burden.

These changes to the workflow have a transformational impact. Because adaptive chemical spaces use cutting-edge 3D binding classifiers (rather than simple “likeness” scores typical of static chemical spaces), they yield superior candidates, particularly when looking for rare or challenging binding profiles. And the hits have better drug-like properties too. Equally importantly, by using only validated reactions and initial reagents that are already in stock, the cycle time from idea to validated hits is as little as six weeks, while eliminating the risks from geopolitical instability that have hampered traditional in silico screening approaches that rely on curated chemical libraries many of which are held in Ukraine or Russia.

Technologies for discovery small molecule drugs have come a long way, and today utilise many of the same techniques that have dominated discovery across the last century – but in a hierarchy. And the challenge to the would-be drug discoverer has always remained the same: how to search as large and diverse a universe of possible structures with a limited resource. Adaptive chemical spaces are taking us another huge leap forwards in our never-ending quest to drug the “undruggable”.

Drugbaron Blog

How to find a drug: the past, present and future of small molecule drug discovery￼

Yearly Archive

How to find a drug: the past, present and future of small molecule drug discovery