Proteomics – examining the panoply of proteins within a complex sample – has been around much longer than the word we use today to describe it. Proteomics the word was coined in around 1997 as a portmanteau coupling the now-ubiquitous “omics” suffix to the word protein. But two decades earlier two-dimensional gel electrophoresis was the cutting-edge technology that first visualised the “complete” catalogue of proteins in a sample, such as a cell lysate or blood sample.
Even the earliest 2D-gels revealed the complexity of the proteome, with beautiful trails of spots across the isoelectric point axis delineating proteins with multiple charged post-translational modifications such as phosphorylations. But almost half a century later, technology has struggled to deliver a reproducible and accurate picture of this complexity. Most proteomics methodology used today provides an enumeration of the major proteins present with poorly validated attempts at quantitation and even less focus on the subtly different variants of each “protein” present in the mixture.
Unsurprisingly in a world dominated by the molecular biology of DNA, the concept of protein has come to mean the product of translating a single mRNA from a single gene – ignoring the complexity that arises both from errors and from post-translational modifications that are both deliberate and regulated (such as phosphorylation) and those that simply damage the polypeptides (such as most oxidations). All these close-variants get consolidated in a single concept: in most people’s mind, a protein such as apoE is a homogeneous population of perfectly translated copies of the encoding gene; in reality it is a morass of subtly different chemical entities – so many in fact that hardly any two molecules of “apoE” are actually identical.
This chemical diversity within the population of molecules derived from a single gene has been termed “quantum resolution” proteomics, by analogy to the finer resolution of the quantum domain compared to “classical” physics. That it exists is interesting, but the real question is whether it matters? If this “quantum zoo” of subtly different variants is really nothing but noise (and all the variants have identical function), then the answer would clearly be a resounding ‘no’. But data is accumulating to suggest it matters a lot: resolving one variant from another – and understanding what drives their relative concentrations – may be just as important in biology as regulation of gene expression.
Some examples are already obvious: proteolytic cleavage is a well-known post-translational modification, not least because it creates a big change in the targeted protein. Regulated cleavage is central to some of the most important signalling cascades in biology, such as blood coagulation and complement activation. For example, thrombin is generated by the cleavage of prothrombin by the prothrombinase complex; C3b the “engine” of the complement response is generated by the cleavage of C3 by enzymes such as complement factor B. Yet most modern proteomic protocols cannot readily distinguish thrombin from prothrombin or C3b from C3.
Many post-translational modifications are more subtle, perhaps altering only a single amino acid in the whole polypeptide. Some, such as phosphorylation events, are well-understood to modify protein function and are consequently highly regulated by complex networks of kinase and phosphatase enzymes that add and remove specific phosphate groups respectively. Again, while its relatively straightforward to look at the pattern of phosphorylation across the proteome (the so-called “kinome”) using specific methodology, routine proteomics methods do not provide non-hypothesis driven quantitation of such post-translational modifications.
Then there are unregulated modifications, which include errors in the transcription and translation process as well as “damage” to the mature protein that accumulates with time. But these can also be biologically relevant. For example, it is well known that glycation of haemoglobin through non-enzymatic reaction with glucose to generate a product known as Hba1c is a useful measure of glycemic control in diabetic subjects. There are many similar examples that have been identified by reductionist approaches – but how many more remain undiscovered because they are invisible to modern proteomic approaches?
We are about to find out.
In 2015, Methuselah Health was founded to investigate the role of proteome instability during ageing, and the extent to which the instability of specific proteins might underlie development of age-related diseases – a concept that arose from the ground-breaking work of Professor Miro Radman (which DrugBaron highlighted previously). To enable that research, the team at Methuselah Health needed a proteomics approach that would accurately quantify post-translational modifications across the proteome – and since that didn’t exist, their Chief Technology Officer, David Mosedale, set about creating it.
As a commercial drug discovery company, much of their progress has been shielded from public gaze, even as they made ground-breaking advances to the technology platform. But all that is about to change: fast forward seven years and RxCelerate has just acquired Methuselah Health and is lifting the lid on the proteomics platform they have created – and making it available to the whole drug discovery and development industry as ProQuant™.
ProQuant™ is a world-leading proprietary platform for performing LC-MS bottom-up proteomics on a wide range of biological samples, delivering quantitative accuracy unachievable with any other technology. It incorporates a number of different technological advances both in the practical LC-MS data collection and in the subsequent bioinformatic analysis developed by Methuselah Health over the last decade. Together these advances unlock at least an order of magnitude improvement in peptide-level quantitation while retaining deep coverage when analysing complex protein mixtures.
Why does that matter? Improved quantitative reproducibility is not just a “nice to have” – it unlocks applications that are impossible with current state-of-the art proteomics. Below are a few examples taken from proquant.bio:
Characterisation of proteoform landscapes. Proteins carry a wide range of PTMs throughout their sequence, ranging from modified amino acids to cleavage sites with such a dizzying array of variations that in principle no two copies of the same protein are ever identical. Conventional shotgun proteomics collapses the information from different peptides within the same protein to provide a modestly reproducible quantitative estimate of the “total” level of all the different proteoforms of the same protein, but lacks the reproducibility to describe the proteoform landscape within a protein.
ProQuant™ allows you to extract all that rich information about the different proteoforms present within a sample and analyse how those patterns are related to any biological phenotype of interest – information that’s invisible without these advances. For example, by examining collagens in skin for cross-linking patterns, location of hydroxyprolines and cleavages, it becomes possible to investigate the impact these PTMs have on matrix integrity and ultimately skin ageing. Alternatively, we can look at the activation status of complex proteolytic signalling cascades such as the haemostasis system or the complement system, yielding insights that are normally impossible to see in a proteomic profile.
Better yet, ProQuant™ can perform all these analyses on the SAME dataset. Instead of having to guess what biological systems might be relevant to a phenotype of interest, the data can guide you. Datasets Methuselah Health generated five years ago are still yielding insights today.
Monitoring protein modification in vivo. Does a protein of interest undergo cleavage in vivo, whether intentionally or not? ProQuant™ can tell you how much is cleaved, but also where the major and minor cleavage sites are and how they change over time and in different tissues. In one study, ProQuant™ has been used to study an engineered protein that was intended to be cleaved in two different places, one yielding activation and then at a second site to deactivate it. Prior studies had been able to follow the gross activity with time, they couldn’t quantitate how much cleavage had occurred at each site over time – but ProQuant™ revealed exactly what was happening at the molecular level. Even sites with as little as 0.01% cleavage could be detected and reproducibly quantified.
This capability of ProQuant™ isn’t limited to cleavage. It can be used to monitor other post-translational changes over time with a similar stunning degree of sensitivity. For example, therapeutic antibodies suffer gradual chemical modification over time in vivo, with certain glutamine (Q) and asparagine (N) residues deamidating to glutamate and aspartate respectively, while other residues such as methionine and tryptophan become oxidised. These modified proteoforms do not necessarily affect the function of the antibody, but they can increase the risk of anti-drug antibodies forming and shorten the circulating half-life substantially. Today, it is common to look for these PTMs during antibody manufacture, but almost impossible to study them after administration in vivo. But with ProQuant™ it’s straightforward.
Monitoring protein modification in vitro. If you can watch how an individual protein changes with time in vivo, in a complex protein mixture, you can do the same in vitro on an isolated protein. ProQuant™ makes trivial the quantification and location of labels, for example. If you think you are introducing a site-specific label, only ProQuant™ has the sensitivity to find low-level off-target events with ease.
This can be useful to quantify protein adducts (for example when radiolabelling a protein or preparing antigens for vaccine production). But it can also superpower chemoproteomics, for example quantitating the degree of protein modification with covalent drugs more accurately than ever before in both simple systems with just the intended target but also in complex biological samples from in vivo studies.
Whether its CMC of a complex biological agent, PK for a protein therapeutic that is undergoing modification in vivo, or a purely academic attempt to understand the conformational dynamics associated with activity of an intriguing protein, the availability of ProQuant™ will help the whole pharma industry conduct better R&D.
So how does ProQuant™ work? At a purely practical level, ProQuant™ couldn’t be simpler: a protein sample (whether from a recombinant protein in vitro or blood or tissue sample) is processed through a multi-step process to generate a solution containing peptides derived from all the protein molecules present. This peptide solution is then subject to liquid-chromatography mass spectrometry (LC-MS) on a state-of-the-art Orbitrap mass spectrometer that delivery mass accuracy down to ~1-5ppm, which for small peptides gives us accuracy below 1% of the mass of a hydrogen atom. The resulting datafile is then processed using a proprietary workflow to identify as many of the peptides as possible from a combination of their accurate mass and fragmentation pattern and assign an accurate quantitation to the amount of that peptide present in the sample.
A wide range of proprietary bioinformatic tools are then used to extract knowledge from these ultra-large, ultra-accurate datasets. For example, ProQuant™ can highlight not only which proteins have different levels in samples from different groups, but which ones have the biggest differences in proteoform landscapes. Alternatively, it can report the levels of specific PTMs across all the proteins in the sample. There are even tools for identifying novel PTMs among the peptides whose identity cannot typically be assigned (what we call the “dark matter” in proteomics, which can be as much as half of all the peptides detected).
This in-depth analysis is possible with 10µg of total protein – and when necessary, ProQuant™ has delivered valuable data with as little as 1µg although the depth of coverage and sensitivity for rare events clearly start to be compromised.
There are some methodological decisions that have to be made for each project. For example, the choice of protease (or, occasionally, chemical reagent) used to generate the peptide mixture will make some features of the proteome easier to see and others more challenging. For example, trypsin, the most common enzyme used, cleaves after arginine residues making it difficult to study in vivo proteolytic events mediated by proteases that cleave at similar sites – so if you want to study such cleavage events an alternative enzyme such as GluC would be used. Different enzymes also cut with different frequencies, creating mixes of larger or smaller peptides which again offer different advantages (for example, larger peptides are harder to unambiguously identify from their accurate mass, but retain more information about the relationship between spatially separate PTMs within the same protein molecule). Similarly, the duration of cleavage could be varied to reveal information about the conformational state of the target proteins; or else the proteins can be fully denatured and digested to deliberately eliminate any information related to conformation – the variations are endless depending on the particular nature of the question to be addressed.
What makes ProQuant™ so much better than existing proteomics technology? The key technological advances are proprietary, unfortunately. But there is no one single innovation that has delivered the step-change in performance. Instead, many different optimisations to both the data collection and analysis sum together to deliver an order-of-magnitude improvement in analytical reproducibility.
One of the key changes is to the way the mass spectrometer collects data on peptides coming off the chromatography column. A typical sample contains hundreds of thousands of peptides, many hundreds of which may elute from the column in a single second. The mass spectrometer needs to both quantify each and collect enough information on the fragmentation pattern of each (the so-called MS2 scan) to unambiguously identify it. And even the fastest mass spectrometer has to make compromises in the way it does this. ProQuant™ manages this compromise much better than its competitors.
Other important gains come from the proprietary use of machine learning to select and keep only the “real” data from peptides in each sample and ignore artefacts; as well as the way the software generates an estimate of the relative amount of each peptide in the sample.
The gains are not just theoretical: Methuselah Health looked at the quantitative reproducibility of a wide range of competitor methods by analysing replicate sets of samples and comparing the replicate coefficients-of-variation across the whole proteome: ProQuant™ substantially out-performed all existing approaches.
The world never stands still, and other approaches have been published to try and address the limitations of LC-MS shotgun proteomics, of which perhaps the best known is called Data-Independent Acquisition (or DIA-SWATH), which is quite different in its methodology. But again, a head-to-head comparison analysis of the same samples using DIA-SWATH performed by a leading supplier of that approach and ProQuant™ was performed – and once more, ProQuant™ came out far ahead.
In the past seven years, out of the public gaze in a small private biotech company, a hidden revolution has taken place in proteomic technology. Over the next seven years, we will see just how much impact these advances can make for drug discovery and development.
RxCelerate Ltd is an outsourced drug development platform based near Cambridge, UK. We specialize in delivering an entire road map of drug development services from discovery and medicinal chemistry through to formal preclinical development and clinical up to Phase IIa. In the last five years, we have witnessed dramatic changes in the drug development …