Machine-Learning predicts best-in-class gene base editor

machine learning base editor predictor BE-Hive

Date: 18th June 2020

Article in brief:

Base editors (BEs) are potent tools for precise genome editing, and can be used to correct single disease-causing mutations.  New BEs are being developed at an astonishing rate and finding the optimal one for a potential therapeutic or specific edit can be difficult.  Now scientists have developed a machine learning (ML) model that accurately predicts base editing outcomes and determines which BE is ‘best-in-class’.

BEs chemically change one DNA base into another without creating a double-stranded DNA break.  They were developed by David Liu and Alexis Komor, at Harvard University, US, in 2016, the first being a cytosine BE (CBE).  Then, a year later, Nicole Gaudelli, a researcher in the Lui lab developed the first adenine BE (ABE).  The subsequent BE boom has left us with a plethora to choose from and, as with many biological tools, each editor has its own personality.  However, the factors that determine base editing outcomes are not well understood, for example; the gene editing target window can range from 2 to 5 base pairs (bp); some BEs might modify one bp whilst others modify two.

Now, Lui and his team have turned their attention to creating a catalogue of BEs, designing and training a machine learning model to predict each base editor’s particular ‘personality’ allowing researchers to choose which BE is the most suited for a particular gene sequence.

The team started by creating experimental data from editing 38,538 target sites in human and mouse cells with 11 of the most popular BEs, paired with guide RNAs.  Post-treatment, they sequenced the DNA to collect billions of data points on how each individual BE edited each cell.  The outcomes were then used to train BE-Hive, a Machine Learning model, developed by the first author of the paper, Max Shen.

Once trained, the researchers were able to enter the target genomic DNA sequence and BE-Hive was able to predict, down to the DNA sequence level, what distribution of products would result from each base editor acting on that target site.

The team used BE-Hive to precisely correct 3,388 pathogenic single-nucleotide variants (SNVs) with ≥90% precision.  Indeed, whilst many of these were previously considered intractable, the model correctly predicted otherwise.

BE-Hive was also able to predict transverse edits – rare and abnormal edits – that were previously unpredictable but potentially valuable.  Using BE-Hive the team were also able to correct the coding sequences of 174 pathogenic transversion (SNVs) using the optimal BE for each edit – with ≥90% precision.

Lastly, the team used insights from BE-Hive to discover unknown BE properties which were then used to engineer novel CBE variants, which increased and reduced aberrant transversion editing.

Conclusion and future applications:

BEs are becoming valuable tools and are proving especially powerful for correcting single nucleotide mutations that cause disease.  Whilst they currently have not yet entered clinical trials, the drive towards that milestone is drawing ever closer.  Now with a machine-learning-based searchable library of BEs available for use, free to the public and outperforming human predictions, it brings the tech yet nearer to the clinic.  BE-Hive enables informed selection, in a highly predictable manner.

Only this week we reported the use of BEs, by David Lui’s lab, to repair a single nucleotide mutation in TMC1 mice in vivo which was partially able to restore hearing.  It has also been demonstrated that multiplexed precise base editing works efficiently in primates, enabling the editing of up to three target sites simultaneously.  With safety and efficiencies top of the list of concerns for new technologies entering the clinic, BE-Hive can support both of these factors.

The work here has helped to illuminate individual base editor personalities and nuances and can accurately predict both base editing efficiencies and editing patterns.  The only question remains is how quickly will this AI-driven tech accelerate BEs route to the clinic?


For more information see the press release from Harvard University

Arbab, M., M. W. Shen, B. Mok, C. Wilson, Ż. Matuszek, C. A. Cassa and D. R. Liu “Determinants of Base Editing Outcomes from Target Library Analysis and Machine Learning.” Cell.