Automated Drug Design – an Interview with Chris Radoux, Exscientia
We talked to Chris Radoux, Associate Director of Tractability Informatics at Exscientia, about the role the Cambridge Structural Database (CSD) and CCDC software play in the automated drug design workflows developed at Exscientia.
“My name is Chris Radoux and I’m currently the Associate Director of Tractability Informatics at Exscientia. Before this role, I led the Structural Bioinformatics team, which focused on looking for novel ways of using protein structures to get additional insights for design.”
“Prior to working at Exscientia, I was at the EBI in the ChEMBL group, where I worked on an open targets funded project on incorporating tractability information into the open targets platform. And before that, I was at the CCDC and the University of Cambridge, where I completed my PhD looking into binding hotspots on proteins.”
“I think the best way to describe Exscientia is that it’s a technology and precision medicine company that designs and develops drug candidates. We use AI and computational methods extensively, and have brought six AI designed molecules into clinical development to date, with another two currently in IND enabling studies and multiple programs in the discovery phase. However, I wouldn’t say it is just AI: I get to work with some outstanding scientists too, we need to be able to pair the best science and disease knowledge with the most advanced innovative technologies. All of my colleagues are really exceptional at what they do.”
What Do You Understand by the Phrase Automated Drug Design?
“I think there are multiple levels of automation. I like the comparison that I’ve heard made to self-driving cars. Early on in the development, there are systems to recognize pedestrians and other cars. Then the next level is perhaps that the car is able to drive itself around a car park and then roads. Eventually one day you could imagine simply being able to enter your desired destination and just letting the car do all of the work. However, even in that scenario, human involvement is still important at each stage. And it will continue to take human supervision and critical decision making to set future strategy, or solve unforeseen challenges.”
“I see automated drug design in a similar way: you need an expert to tell automated systems where to go. While AI design is well suited for automation, it is quite hard to capture all aspects that prompted a medicinal chemist to come up with an original idea. How were they educated, who influenced them, what scientific articles did they consume recently, who did they talk to at the latest scientific congress… How much did something they heard on the radio or their choice of breakfast cereal that morning influence the compounds that they drew later that day? It is, in comparison, much easier to capture the input parameters of an algorithm.”
“There is also another side in automation, which is the experimental side. We have recently opened our new automated labs for the synthesis and the testing of compounds. This closes the design-make-test loop with a continuous path of automation. There is so much potential in this space, particularly from being able to join together the algorithms we make with robotics and the testing for an overall integrated process that aims to further save precious discovery time. I think right now we’re still driving around the car park, but I’m excited to see where we can go next.”
How Do the CSD and CCDC Software Help?
“Our main interface with the CCDC software is through the CSD Python API, which allows us to incorporate CCDC tools into the workflows that we build at Exscientia in an automated way.”
“An example is how easy it makes it to integrate Mogul into the filtering stage of our automated design pipeline. With these generative methods we can generate a few hundreds of thousands of designs each with their 3D poses in the binding site of a protein. Mogul allows us to check for unusual torsions, and to quickly eliminate strained conformations, without having to perform a high level of computations that wouldn’t be feasible at this scale.”
“Another example refers to our structure-based projects, where we have an automated protein structure preparation workflow which collates all of the structures for both targets and off-targets that can be of use and calculates the fragment hotspot maps. This is what I developed during my PhD at the CCDC along with Tom Blundell. The hotspot maps are based on SuperStar and benefit from the wealth of interaction data that are in the CSD. The beautiful thing here is that the method doesn’t need to define in advance what an interaction is. If there is something that is important, it will just emerge from the data. There’s been a few examples of less than obvious interactions that could have been missed by looking at the binding site just by eye.”
What Are the Main Benefits of the Use of the CSD/CCDC in Exscientia’s Research?
“The search for unusual torsion angle distribution for the designed drugs would take an extremely long time if this was a manual process. The CSD and CCDC software allow this process to be automated, quick and reliable.”
“In terms of hotspot maps, they can really help make the decision on where the design should start from, and in understanding what is present in the binding site, and what the key interaction looks like.”
“From the point of view of the interactions data, having such a large number of high-quality experimental crystal structures is invaluable. I mostly work with protein structures, but even when we run hotspot maps, which use SuperStar underneath, we are still using the interactions from the CSD data.”
At Which Step of an Automated Design Workflow Are the CCDC Tools Employed?
“One of the key aspects of this automated design pipeline, and why I believe it works as well as it does, is that it starts off by taking some ligands and trimming them back to just the part of the compound that matches the hotspot within the pocket. It looks at which part of the compound is vital to bind to the pocket, to then allow our generative algorithms to grow that molecule back again.”
“For the fragment hotspot map, which uses SuperStar as a source of atomic interactions, we use the Aromatic CH Carbon probe and the Carbonyl Oxygen probe as acceptors, and the NH Nitrogen probe as a donor. The insights that you get by looking at these maps are very important. You can see the Carbonyl Oxygen probe giving propensity not just to the classical hydrogen bond donors that you would expect, but also around the aromatic CHs of a tryptophan, for example. When this combines with how hydrophobic the rest of that pocket is, that acceptor hotspot in the fragment hotspot map algorithm becomes crucial. This is something that could easily be missed by a docking algorithm or visual inspection, but it’s captured there because it comes from the data and not from an equation.”
“Another of the CCDC tools used at Exscientia is CSD-CrossMiner, important when we need to identify some ligands as a starting point to trim back. When we run a query in CSD-CrossMiner, we know that the search will bring back real 3D confirmations of real molecules, and that we can trust the results. In the context of a binding site, large portions of your hits can often clash with the protein or with some other parts of the molecule which will then be non-suboptimal. This workflow would just remove anything that is not good for binding to this target and prioritize the kind of fragments that are.”
“Finally, as I mentioned earlier, once we’ve done the big generative step and we’ve got a few 100,000 compounds, Mogul is a quick tool for checking the 3D poses we have and eliminating anything that could potentially be strained.”
What Is the Value of High-Quality Data for Scientific AI?
“High-quality data is essential for Exscientia and probably many companies like us. Drug discovery consists of a series of predictions that help understand whether a compound may have a positive effect on a person with a disease. Having high-quality data maximizes what we can get from these AI models. Even if the performance impact of the model is small, when this adds up over a whole drug discovery program, it may have a big impact on the results.”
“As someone who has worked with data and built computational methods that use data, having easy computational access to that data is so important. I also like that the CCDC is a strong advocate for the FAIR data principles. As AI and machine learning are becoming more and more important, so will the data.”
What Is the Role of AI at Exscientia?
”At Exscientia, we believe the role of AI is not to replace humans. It is to make humans more productive. In that same spirit, I’m convinced that scientists won’t be replaced by AI, but scientists that use AI will replace those who don’t.”
“There needs to be a partnership between AI and scientists. I can go out and buy a guitar, but I am not going to be able to make any music with it because I don’t know how to play the guitar. In the same way, with AI you need the expert to understand how to use it appropriately – what you can and cannot do – and to tell the algorithms what to do next, where to go next.”
“At Exscientia, we use AI in a multitude of ways: from generating compounds, designing ideas, predicting properties, through to helping select the patients most likely to respond to treatment.”
“I think that AI is an important enabler, but it doesn’t exist in a vacuum. It’s well supported by an incredible organization of experienced and really talented people.”
What Will You Cover in Your June Webinar?
“In the webinar I’ll be covering a piece of work where we put together a fully automated structure based pipeline to discover novel inhibitors against two kinases: these are DYRK1B and PKD1. These kinases don’t have crystal structures available in the PDB, so we used the AlphaFold2 models to test if these models would work for design and to investigate what performance we could get.”
“What I mean by automated here is really just a ‘hit the go button’ to set the workflow running and get hundreds of thousands of potential molecules which are then filtered down to the best population of 50 compounds as the output. In the webinar I will be sharing both the structures and the validation data.”
Who Should Attend and Why?
“This work was not designed to be a kind of a black box. I know that sometimes people can think of AI as a mystery black box and don’t quite know how it works, but what I will present is really just a series of autonomous subsystems that replicate how our designers interact with our tools.”
“I think that anyone interested in drug design and drug discovery will take away a lot from this webinar.”
The webinar was held on June 25th and can be watched on demand at the link below.
Next Steps
Watch on demand the webinar Automated Design of Kinase Inhibitors Using AlphaFold 2 Models presented by Chris Radoux from Exscientia.
Find out more about Exscientia.
To discuss further and/or request a demo with one of our scientists, please contact us via this form or .