I while back I started a paper discussion series with the idea of describing the scientific process undertaken by the authors and extracting the message or result they found. This is the third instalment of this series and it is about an article in the highly regarded journal Science.
Cancer is one of the leading causes of death worldwide. In 2015 it caused the death of 8.8 million people, with lung cancer as the most common type. If you are trying to prevent cancer the trick is not necessarily to take the cancerous cells out of the body, it is to find them in the first place. As the authors state:
The majority of localized cancers can be cured by surgery alone, without any systematic therapy.
If you can prevent metastasis, the spreading of cancerous cells throughout the body, then no need for cytotoxic drugs or immunotherapy. Sounds pretty simple right? Find the location of the cancer cells and just cut it out.
But in finding the cancers cells is exactly where the problem lies. Cancerous cells are easy to find once it is already too late because they are distributed throughout the body. Now we come to the question of how you find the cancer cells. The thing with cancer cells is that they are still your cells, and they often look similar to your cells, except for the fact they that they show uncontrollable growth.
The researchers figured that all cells interact with the bloodstream, which means also the cancer cells. They found that cancers gave of certain biomarkers in the blood while also having certain genetic alterations. They give examples by Bettegowda et al. and Cohen et al. showing that blood assays can be used to detect these biomarkers. The authors wanted to combine the results and make a generalized test suitable for wide applications. They have given themselves four criteria the test should adhere to:
- The test should query a sufficient amount of bases, which are the single units of a gene. In essence the building blocks of the gene. If a large number of cancers should be detected then the number of tested bases would also have to be high.
- Every single base should be tested thousands of times for errors, which is done with polymerase chain reaction (PCR). A mutated base might cause the right RNA sequence to form 99% of the time, and be wrong 1% of the time. These mutations are known as low prevalence mutations.
- The number of tested bases should have a limit to decrease the number of artefactual mutations. These are by-products of the PCR reaction used to amplify the bases, also known as jackpot mutations. Since it is out of the scope of this discussion you can find more information here and here.
- The test should be cheap and capable of high throughput. To allow it to be used in as many places as possible.
To find the optimum amount of genetic markers to test for, they looked through publicly available data to find the best way to detect the eight types of tumours. In the end they applied some math (for those interested, they found a fractional power law relationship between the number of markers and the sensitivity of detection) and found that 60 amplicons, which are the things querying the bases, was the lucky number.
Now to the second part of CancerSEEK. Circulating tumour DNA (ctDNA) is not always released in the early stages of cancer development, but protein biomarkers are! The authors tested 41 different protein biomarkers. They found that 39 could be sufficiently evaluated with a single run of their immunoassay platform. Further sample testing showed that 8 biomarkers were specifically proficient at discriminating between people with cancer and without cancer.
In the end their assay tests for 8 proteins and evaluates 2001 random base pairs for genetic mutations to detect eight common types of cancer.
Does it work?
They tested their final design of CancerSEEK on 1005 patients diagnosed with Stage II and III cancers. The diagnosed cancer types were ovary, liver, stomach, pancreas, esophagus, colorectum, lung, and breast cancer. They added 812 healthy people to the sample pool to test the capability of their assay. The result is in figure 1.
Figure 1: Proportion of cancers detected by CancerSEEK (%). The error bars in the graph show the 95% confidence interval. From Cohen et al.
It is clear that ovary and liver cancer are highly detected, while breast cancer has a quite low detection probability at ~30%. Overall the results are already quite good. High detection chances over multiple stages of cancer for at least 7 out of 8 common cancers! But the authors weren’t done yet.
Could it be possible to use the results of CancerSEEK to localize the cancer source? In an attempt to answer this questions the authors brought in the help of artificial intelligence. Supervised machine learning was used to learn a machine to identify cancer location based on the assay result. Very generally this works by giving an AI some questions and answers. It then does some magic and out comes incomprehensible algorithms that can answer more of the questions similar to ones you asked. Just now it does it without knowing the answers up front. Here I have to suggest this video by CPG Grey about machine learning.
The AI was able to pin down the source of cancer to two organs with 83% of the patients, while it was able to pin down the source of cancer to one organ for 63% of the patients. Quite an impressive result.
I think this article really shows the power of collaborative science. People show that individual biomarkers and genetic alterations can be used to identify certain cancers. Then the information gets collected, aggregated, and out comes a blood assay that combines as a detection and localization test for eight common cancers.
This post was a discussion based on the following paper:
Cohen, J. D., Li, L., Wang, Y., Thoburn, C., Afsari, B., Danilova, L., … Papadopoulos, N. (2018). Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science, 3247(January), eaar3247. https://doi.org/10.1126/science.aar3247
World Health Organization