DeepMind’s AI set to transform research by solving protein structures

DeepMinds AI solves protein structure

Date: 2nd December 2020

The majority of diseases, including cancer, dementia and infectious diseases such as COVID-19, are related to protein function – which in turn is directly correlated to the 3-D shape of the protein.  As such, scientists worldwide have be striving for 50 years to find rapid and precise methods to predict this structure.  Now, DeepMind’s artificial intelligence (AI) system, AlphaFold, has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP).  The AI breakthrough represents a stunning advancement on how proteins fold and could revolutionise medical research.

The CASP challenge first started in 1994, and was inspired by the 1972 Nobel Prize in Chemistry winner Christian Anfinsen, who was co-awarded the prize for showing the connection between the amino acid sequence and the biologically active conformation of proteins.

The challenge now involves teams around the world who are given a set of ~100 protein amino acid sequences, which are then studied by lab scientists to determine their shape experimentally, and in parallel ~100 teams from more than 20 countries do the same using computers.  The results are then assessed by independent scientists.

The CASP organisers announced on Monday 30th November, that the AI program called AlphaFold, created by DeepMind, has proved capable of determining the shape of many proteins, and provides a solution for this 50 year old question.

DeepMind and AlphaFold

DeepMind is a London-based artificial intelligence company and research laboratory, and was founded in September 2010 by Demis Hassabis, Mustafa Suleyman and Shane Legg.  In 2014, it was acquired by Google, and DeepMind’s long term aim is to solve intelligence, developing more general and capable problem-solving systems, known as artificial general intelligence (AGI).

AlphaFold uses new deep learning architectures which enables the system to achieve unparalleled levels of accuracy.  It translates folded proteins into ‘spatial graphs’, where residues are the nodes and the edges connect the residues that are in close proximity.  The graphs are crucial for understanding the physical interaction with proteins and contains evolutionary history information.  AlphaFold uses an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of these graphs, while reasoning over the implicit graph that it’s building.  The AI algorithm uses evolutionary related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine the graph for each protein.

14th Community Wide Experiment by CASP

The main metric used by CASP to measure the accuracy of predictions is the Global Distance Test (GDT) which ranges from 0-100, which is a measure of the percentage of amino acids residues that fall within a threshold distance from their correct position. An informal score of 90 GDT is comparable to those results obtained from complex, time consuming experimental methods.

In the 14th Community Wide Experiment by CASP, the AlphaFold system achieved a median score of 92.4 GDT overall across all targets, and even for the most challenging proteins it achieved a median score of 87.0 GDT.  Resulting in a successful solution for the DeepMind team.

Conclusions and future applications

There is much hype and excitement from scientists globally on this first-in-class AI method of predicting protein structure.  It is being hailed as an extraordinary achievement and one that will completely change the face of medicine. It undoubtedly will make a huge impact on research, saving a vast amount of time and resources that are currently directed to lab-based methods for determining protein-folding.

The team are currently collaborating with a small number of specialists groups that are looking into how protein structure predictions could contribute to our understanding of specific diseases, for example by helping to identify dysfunctional proteins and to discover whether this causes changes in their interactions . These insights will hope to enable more precise work on drug development, complementing existing experimental methods to find promising treatments faster.  Indeed, AlphaFold was able to predict the shapes of several coronavirus proteins soon after the virus was first sequenced in January this year.

Whilst AlphaFold now offers us the chance to accurately predict single protein structure, the next challenge that needs to be tackled is a method for determining the shape of protein complexes.  This will be an important milestone in the journey of understanding the machinery of life, and how proteins work together.

What clearly emerges from this work is how powerful AI is becoming and that is one of our most useful tools in expanding the frontiers of scientific knowledge.  From AI-driven ‘smart’ cell therapy or AI robots for treating cancer, to AI driven solutions for COVID-19 or machine learning tools for predicting best-in-class base editors, AI is providing rapid, precise solutions for many of our most crucial biological questions and it is accelerating and shaping translation research through to the clinic.

 

For more information please see the press release from CASP or DeepMind