The difficulty of Method Validation, illustrated with examples taken from Computer-Aided Drug Design

13 February 2020

Professor Paul Finn is a Professorial Research Fellow at The University of Buckingham and is the Chief Executive Officer for Oxford Drug Design. In this seminar, Paul explores how methods in drug design need to be validated in order to demonstrate expected real-world performances and identifying the issues caused by lack of data and insufficient or poorly applied controls.

In his seminar on Friday 31^st January, Paul Finn introduced the principle of how drugs work. Each cell in our body contains thousands of proteins which are needed for it to survive, grow and produce energy. Sometimes these proteins do not function as they should, causing disease. Having identified a protein involved in the disease, a chemical compound (drug) that will physically interact and stop the protein and correct its function needs to be identified. The complexity of living organisms makes identifying drug compounds very challenging.

Computer aided drug design techniques are used for this purpose, including virtual screening, a name given to a range of computational techniques that aim to identify a new active drug compound. This uses as input information on known active compounds and database of compounds, which could contain over 4 million compounds, available for purchase and testing.

In the virtual screening process, compounds are identified for the target protein based on a theoretical model of activity. These models need to be validated. However, validation of virtual screening methods is difficult. For example, there are few reports of negative data (that a compound is inactive against a given target) in the literature. Data is often augmented by adding compounds to the list of inactives. If not done with great care, biases can be introduced that lead to exaggerated success in validation studies that do not translate into success in real-world examples.

Examples of validation methods were presented that initially appeared to work well but were subsequently demonstrated to be artefactual results because of bias in the validation datasets. The examples included two from deep learning approaches. Convolutional Neural Networks (CNNs) were investigated in the hope that they would be able to “learn” the underlying physics that determine the binding affinity between a compound and its protein, and thus be generally applicable. However, these methods too can easily learn to distinguish active from inactive compounds through artificial differences arising from dataset construction,

In conclusion, method validation is complicated and with data volume often being limited, data augmented techniques can cause their own bias which need careful consideration as this can cause problems for machine learning.

Professors and lecturers of the School of Computing give postgraduate and undergraduate students an insight into their research, showing them range of research projects which have been undertaken. This is an opportunity to broaden their understanding of computing and identify areas of interest for further study. All are encouraged to attend.

Check out the upcoming School of Computing events and seminars.

Find out more about our courses in computing.