AI predicts over 20,000 unknown links between viruses and mammals

Date: 30th June 2021

Knowing the potential host range of a virus is essential for efforts to mitigate the global burden of viral diseases.  However, our knowledge of this remains restricted and as an important predictor of zoonosis this information is critical.  The data we do have is biased towards humans and domesticated mammals, and as we have seen with the current pandemic, viruses such as SARS-CoV-2, can have a broad range of hosts including bats, cats, ferrets and pangolins, which may have facilitated the transmission to humans. Now, researchers have used artificial intelligence (AI) to predict more than 20,000 unknown associations between known viruses and susceptible mammalian species.

Thousands of viruses are known to affect mammals, with recent estimations indicating that less than 1% of mammalian viral diversity has been discovered to date.  Whilst, some viruses have a very narrow range of hosts, others such as rabies have a broad host range, in theory able to infect any mammal.  The majority of human emerging infectious disease are zoonotic such as Ebola, HIV and now COVID-19, with viruses originating in wild mammals.  Understanding patterns of viral diversity in wildlife and determinants of successful cross-species transmission, or spillover as it is also called, is an essential part of global pandemic surveillance however, there are few analytical tools that exist to identify which host species are likely to harbour the next human pandemic causing virus, and which virus can spillover.

Now, researchers at the University of Liverpool, UK, led by Maya Wardeh, have used machine learning to integrate mammalian and viral traits with network features to predict virus-mammal associations.  They found over 20,000 unknown associations between known zoonotic viruses and wild and semi-domesticated mammals, five times as many than had previously been thought.  Furthermore, bats and rodents, were linked with an increased risk of zoonotic viruses.

The team started by developing a novel machine learning framework by consolidating three distinct perspectives. The first, from the perspective of each of the mammals for example based on the traits of viruses known to infect mammals, second from the perspective of each of the viruses for example based on the traits of the mammalian species which a particular virus has been found to date, and lastly from the perspective of the network linking known viruses with their mammalian hosts.

The framework utilised 6,331 associations between 1896 viruses and 1436 terrestrial mammals, representing 0.23% of all possible associations between these mammals and viruses. The team used the algorithm to determine how much these association were an underestimation, by predicting which unknown species-level association are likely to exist in nature or already do exist but are undocumented.  The model predicted 20,832 unknown associations potentially exist between mammals and their known viruses,  with 18,920 in wild or semi-domesticated mammals.  This was  an over 4 fold increase in virus-mammal associations, with a nearly 5 fold increase in wild and semi-domesticated mammals than is currently known.

The model also predicted an over 5 fold increase in associations between wild and semi-domesticated mammals and viruses of economically important domestic species such as livestock and pets.  Bats and rodent, in particular, were linked with an increased risk of zoonotic viruses, suggesting target surveillance of these species might be a worthy approach in the future.

Conclusion and future applications

The team here have shown that AI can highlight large numbers of potentially missing associations of medically and veterinary disease causing viruses and their potential hosts.  With a massive 20,000 unknown associations now predicted, it suggests our previous knowledge was relatively limited.

The data of virus-host interaction will facilitate the identification and mitigation of future zoonotic risk, and identify spillover from animals into humans.  With this in mind the team is looking to extend their approach to incorporate arthropod vectors and intermediate hosts, and to include different classes of pathogens and hosts.  One in particular, birds, is likely to be an important addition to the strategy, as they are known to be important reservoirs or amplifying hosts for viruses such flaviviruses.

The AI field is currently moving at an astonishing rate and is changing the medical field.  Much of the AI work is focused on driving diagnostics and predicting disease outcomes.  We have seen AI being leveraged to predict mortality from echocardiograms, to diagnose autism from maternal biomarkers and early-stage breast cancer, as well as predicting dynamics of brain networks in response to microstimulation. It is accelerating drug discovery and is even identifying new subtypes in certain diseases.  The work here adds to these ground breaking discoveries, giving us an opportunity to identify potential zoonotic risks and to help us avoid or limit the widespread devastation such as we are currently experiencing in the ongoing pandemic.


For more information please see the press release from the University of Liverpool


Wardeh, M., Blagrove, M.S.C., Sharkey, K.J., and Baylis, M. (2021). Divide-and-conquer: machine-learning integrates mammalian and viral traits with network features to predict virus-mammal associations. Nature Communications 12, 3954.