Machine learning nanosensor platform detects early cancer biomarkers

machine learning nanosensors detect cancer

Date: 30th November 2021

Early cancer detection is key to improving patient outcomes however, for many malignancies such as ovarian cancer, symptoms can be mild or lacking early during its development which often leads to metastasis and late diagnosis.  Furthermore, for ovarian cancer there is currently no effective screening test together meaning that ovarian cancer is a one of the most common causes of cancer-related deaths in women. Now, researchers have developed a perception-based platform based on an optical nanosensor array that leverages machine learning (ML) algorithms to detect multiple protein biomarkers in biofluids allowing them to detect ovarian cancer early.

Biomolecular identification methodologies currently rely heavily on one-to-one recognition via specific proteins and nucleic acids such as antibodies, peptides, and aptamers which bind to analytes in biofluids.  However, this brings many challenges such as developing highly sensitive and specific binding moieties able to detect low quantities of target molecules, long-term stability/robustness and production limitations for example, and this is especially limiting with regards to ‘tricky’ antibodies where many have to be developed to detect multiple disease biomarkers.  Technologies that replace antibodies could open new avenues for diagnostics, and would be potential be invaluable.

One such technology being explored are perception-based machine learning (ML) platforms.  They are bioinspired by the complex olfactory system, where the system can isolate individual signals through an array of relatively nonspecific receptors.  The receptor captures certain features, but the overall ensemble response is analysed by our brains, resulting in perception.  ‘Electronic’ noses and optical noses have previously been investigated but these have a limited ability to detect biomolecules under physiological conditions in complex biofluids.

Now, researchers at Memorial Sloan Kettering Cancer Center, US, led by Daniel Heller have developed DNA-single-wall carbon nanotube (DNA-SWCNTs)-based photoluminescent sensor arrays where optical responses were used to train ML models to detect a panel of gynaecologic cancer biomarkers. 

The team started by creating 132 distinct DNA-SWCNT complexes using 11 single-stranded DNA oligonucleotides wrapped around 12 SWCNT chiralities (species) to form DNA-SWCNT sensor arrays. SWCNTs emit near-infrared photoluminescence with distinct narrow emission bands, and with each individual species having distinct bandgaps this means they have varying sensitivities to redox phenomena and gave distinct responses to biomarker interation likely due to the structural difference of these such as size, charge, hydrophobicity and levels of glycosylation for example. These optical responses were analysed via high-throughput NIR spectroscopy.

However, to determine whether the DNA-SWCNT array could correctly identify analytes within a complex environment such as biofluids the team turn to ML.  The ML algorithms were trained on initial datasets using the optical responses of the arrays to detect a variety of gynecologic cancer biomarkers such as HE4, CA-125, and YKL-40 in laboratory-generated samples, and screened over 17 different biomarker combinations.  The experiments showed the model could detect with high precision single and multiple biomarkers in the mixtures.  They saw distinct changes in fluorescent peak position and intensity values from each DNA-SWCNT combinations in response to the protein analytes, which enabled the prediction of presence and concentration of each biomarker.

With these data showing promise, the team then turned to patient samples.  Here, they used uterine lavage samples from consenting cancer patients with malignancies such a ovarian and endometrial.  They found that their now optimised biofluid method enabled the simultaneous detection of multiple biomarkers in patient samples, with F1-scores of ~0.95 in uterine lavage samples from patients with cancer. By comparing actual levels of each biomarker measured by the clinical laboratory and the predicted results from platform they demonstrated a classification successes of 100% for HE4 and CA-125 and 91% for YKL-40 in cancer patient samples. This suggested that a nanosensor/perception-based sensing system could accurately detect multiple disease biomarkers in patient biofluids.

Conclusions and future applications

The team here have developed a new approach for the detection of multiple biomarkers in biofluids for disease diagnosis using an artificial molecular perception system.  Using an array of relatively nonspecific DNA-SWCNT sensors it has allowed the platform to form a wide diversity of responses when exposed to different target proteins, and in this case targeting gynaecologic cancer biomarkers HE4, CA-125, and YKL-40. ML algorithms enabled training from DNA-SWCNT spectral response data to detect biomarkers in both laboratory-generated samples and cancer patient uterine lavage samples with high accuracy.

Looking ahead the team will be increasing the number of patient samples to continually validate and increase the robustness of the model.  They will be developing the platform for translation to the clinic for use in laboratory medicine or even point-of-care settings.  Whilst, here high-throughput NIR screening was used for the training of ML algorithms, they note that simpler optical instrumentation can be used by reducing the optical configuration to few excitation wavelengths using fewer SWCNTs , albeit at the expense of numbers of examples being analysed.  However, this would allow simpler, more portable optical devices to be used at point-of-care.

One of the great strengths of the system is its flexibility due to the nonspecific nature of individual sensor elements.  Therefore, it is not limited to detecting ovarian cancer biomarkers, and can be trained to detect other disease biomarkers, or even potentially be used for disease fingerprinting, without the need to engineer different arrays of nanosensors.

The method here using DNA-SWCNT sensors driven by ML, will add to ongoing efforts to rapidly and accurately detect biomarkers for not only disease but for example to also analyse food components, detect airborne and liquid-based toxins, and environmental hazards.  One such tool being developed is the ‘bio-electronic tongue/nose’ where powerful insect smell receptors are being used with graphene-based biology-gated transistors to detect volatile organic compounds.  Using similar technology, the EV-Chip detects and quantify exosome biomarkers from liquid biopsies whilst the SNP-Chip detects disease causing single-nucleotide mutations in a target DNA sequence. Machine learning is also being leveraged in diagnostic applications such as to identify autism in maternal biomarkers or to diagnose deep vein thrombosis.

This DNA-SWCNT platform may be incredibly valuable when especially robust or long-term measurements are required, and the team will be looking at incorporating the nanosensors  into wearable or implantable devices for these applications.   This could be for example where a strong family history of disease or those with known disease-causing mutations would greatly benefit from an early warning system of disease progression, enabling early detection and rapid action, and would undoubtable save countless lives.


Yaari, Z., Yang, Y., Apfelbaum, E., Cupo, C., Settle, A.H., Cullen, Q., Cai, W., Roche, K.L., Levine, D.A., Fleisher, M., et al. (2021). A perception-based nanosensor platform to detect cancer biomarkers. Science Advances 7, eabj0852.