Multi-dimensional fragmentomic assay for ultrasensitive early detection of colorectal advanced adenoma and adenocarcinoma

Ma, Xiaoji; Chen, Yikuan; Tang, Wanxiangfu; Bao, Hua; Mo, Shaobo; Liu, Rui; Wu, Shuyu; Bao, Hairong; Li, Yaqi; Zhang, Long; Wu, Xue; Cai, Sanjun; Shao, Yang; Liu, Fangqi; Peng, Junjie

doi:10.1186/s13045-021-01189-w

Multi-dimensional fragmentomic assay for ultrasensitive early detection of colorectal advanced adenoma and adenocarcinoma

Letter to the Editor
Open access
Published: 26 October 2021

Volume 14, article number 175, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Hematology & Oncology Aims and scope Submit manuscript

Multi-dimensional fragmentomic assay for ultrasensitive early detection of colorectal advanced adenoma and adenocarcinoma

Download PDF

Xiaoji Ma^1,2^na1,
Yikuan Chen^1,2^na1,
Wanxiangfu Tang³^na1,
Hua Bao³,
Shaobo Mo^1,2,
Rui Liu³,
Shuyu Wu³,
Hairong Bao³,
Yaqi Li^1,2,
Long Zhang^1,2,4,
Xue Wu³,
Sanjun Cai^1,2,4,
Yang Shao³,
Fangqi Liu^1,2 &
…
Junjie Peng^1,2

6310 Accesses
27 Citations
420 Altmetric
57 Mentions
Explore all metrics

Abstract

Previous studies on liquid biopsy-based early detection of advanced colorectal adenoma (advCRA) or adenocarcinoma (CRC) were limited by low sensitivity. We performed a prospective study to establish an integrated model using fragmentomic profiles of plasma cell-free DNA (cfDNA) for accurately and cost-effectively detecting early-stage CRC and advCRA. The training cohort enrolled 310 participants, including 149 early-stage CRC patients, 46 advCRA patients and 115 healthy controls. Plasma cfDNA samples were prepared for whole-genome sequencing. An ensemble stacked model differentiating healthy controls from advCRA/early-stage CRC patients was trained using five machine learning models and five cfDNA fragmentomic features based on the training cohort. The model was subsequently validated using an independent test cohort (N = 311; including 149 early-stage CRC, 46 advCRA and 116 healthy controls). Our model showed an area under the curve (AUC) of 0.988 for differentiating advCRA/early-stage CRC patients from healthy individuals in an independent test cohort. The model performed even better for identifying early-stage CRC (AUC 0.990) compared to advCRA (AUC 0.982). At 94.8% specificity, the sensitivities for detecting advCRA and early-stage CRC reached 95.7% and 98.0% (0: 94.1%; I: 98.5%), respectively. Promisingly, the detection sensitivity has reached 100% and 97.6% in early-stage CRC patients with negative fecal occult or CEA blood test results, respectively. Finally, our model maintained promising performances (AUC: 0.982, 94.4% sensitivity at 94.8% specificity) even when sequencing depth was down-sampled to 1X. Our integrated predictive model demonstrated an unprecedented detection sensitivity for advCRA and early-stage CRC, shedding light on more accurate noninvasive CRC screening in clinical practice.

Discover the latest articles, news and stories from top researchers in related subjects.

Medical Imaging

To the editor

Recently, researchers have focused on utilizing plasma cell-free DNA (cfDNA), including cfDNA fragmentomic profiles, to develop noninvasive approaches for detecting solid malignancies such as colorectal adenocarcinoma (CRC) [1,2,3,4,5,6]. But the limited sensitivities of these current detection methods, by the use of either single molecular feature or single algorithm, reduce their potential utilization in clinical practice, while ensembled stacked machine learning approach can improve robustness and accuracy [7, 8]. Herein, we constructed a multi-dimensional ensembled stacked machine learning approach, employing five different base models on five optimized fragmentation features, to provide an ultrasensitive and cost-effective model for detecting early-stage CRC and advanced adenoma (advCRA).

In this study, 149 early-stage colorectal adenocarcinoma (CRC) patients, 46 advCRA patients and 115 healthy volunteers were recruited in the training cohort from a single center, which was used to train the machine learning models (Figs. 1, 2A). To eliminate the potential impact on the predictive power by the different coverages and maximize affordability, WGS data were down-sampled to 4X coverage, unless otherwise noted. The test cohort (N = 311), which consisted of 149 early-stage CRC, 46 advCRA patients and 116 healthy controls, was used to evaluate model performances. ROC curves were constructed using five individual features including Fragment Size Ratio (FSR), Fragment Size Distribution (FSD), EnD Motif (EDM), BreakPoint Motif (BPM) and Copy Number Variation (CNV), as well as the DELFI fragment size pattern [1] and the 4-bp end-motif pattern by Jiang et al. [2], to demonstrate the advantage of using a multi-dimensional ensembled stacked machine learning model approach, as well as adapting existing fragmentation features [7]. Detailed methodology is described in supplementary methods section (Additional file 1).

The ensembled stacked model had a higher AUC (0.988) than base models using any individual feature (AUC range 0.881–0.981), validating the multi-dimensional ensembled stacked approach (Additional file 1: Fig. S1). A similar pattern was observed as the ensembled stacked model had the highest sensitivity for detecting advCRA/early-stage CRC (97.4%, 95% CI 94.1–99.2%) compared to all base models (sensitivity range 57.4–89.2%) at 94.8% specificity (95% CI 89.1–98.1%) (95% CI 89.1–98.1%) (Additional file 1: Fig. S1, Table S1). Additionally, our adaptation to the existing fragmentation features was justified by showing better performances than the original features: the adapted 6-bp EDM feature showed higher AUC (0.981, 95% CI 0.969–0.993) than the original 4-bp end-motif feature (0.969, 95% CI 0.953–0.985), while models using FSR or FSD both had higher AUC (0.881, 95% CI 0.843–0.919; 0.892, 95% CI 0.855–0.930) than the original DELFI fragment pattern (Additional file 1: Fig. S1).

The stacked model showed better AUC while differentiating early-stage CRC (0.990, 95% CI 0.981–0.998) than advCRA (0.983, 95% CI 0.968–0.999) (Fig. 2B). Similarly, the model showed excellent sensitivities for detecting both advCRA (95.7%, 95% CI 85.2–99.5%) and early-stage CRC (98.0%, 95% CI 94.2–99.6%) at the 94.8% specificity (95% CI 89.1–98.1%) (Fig. 2D). The advCRAs more closely resembled early-stage CRCs than healthy controls (Fig. 2C), which was further validated by two additional models (Additional file 1: Fig. S2A, S2B). The current gold standard colonoscopy can be used to histopathologically distinguish advCRA and early-stage CRC following our model’s predictions.

We then constructed an ensembled stacked model using the raw depth NGS data (4.7–24.04X, median 9.75X), still showing great performances an identical AUC of 0.988 (95% CI 0.979–0.997) (Additional file 1: Fig. S3, Table S2). A limit of detection analysis was performed by further down-sampling the 4X coverage WGS data to 3X, 2X, 1X and 0.5X. The down-sampled data was then used to evaluate the 4X model. The AUCs showed a gradual decrease during the down-sampling process (0.988, 0.987, 0.985, 0.982 and 0.977 for 4X, 3X, 2X, 1X and 0.5X data, respectively) (Additional file 1: Fig. S4A).

In summary, our multi-dimensional ensembled stacked model, which uses plasma cfDNA WGS data, showed great potential for accurate noninvasive colorectal cancer screening prior to the current gold standard colonoscopy in clinical practice by demonstrating an unparalleled high sensitivity in detecting early-stage CRC as well as advCRA. However, this study was limited by several factors, namely the relatively small cohort size. The small number of healthy controls within the test cohort can impact the model performance, likely resulting in an underestimation of sensitivity. A multicenter, large-scale prospective study is needed to validate the clinical value of our methods further.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

advCRA:: Advanced colorectal adenoma
CRC:: Colorectal adenocarcinoma
cfDNA:: Cell-free DNA
WGS:: Whole-Genome Sequencing
AUC:: Area Under the Curve
CNV:: Copy number variation
DELFI:: DNA EvaLuation of Fragments for early Interception
FSR:: Fragment Size Ratio
FSD:: Fragment Size Distribution
EDM:: EnD Motif
BPM:: BreakPoint Motif

References

Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, Jensen SO, Medina JE, Hruban C, White JR, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385–9.
Article CAS Google Scholar
Jiang P, Sun K, Peng W, Cheng SH, Ni M, Yeung PC, Heung MMS, Xie T, Shang H, Zhou Z, et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 2020;10(5):664–73.
CAS PubMed Google Scholar
Rasmussen SL, Krarup HB, Sunesen KG, Johansen MB, Stender MT, Pedersen IS, Madsen PH, Thorlacius-Ussing O. Hypermethylated DNA, a circulating biomarker for colorectal cancer detection. PLoS ONE. 2017;12(7):e0180809.
Article Google Scholar
Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, Wang W, Sheng H, Pu H, Mo H, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12(524):7533.
Article Google Scholar
Jin S, Zhu D, Shao F, Chen S, Guo Y, Li K, Wang Y, Ding R, Gao L, Ma W, et al. Efficient detection and post-surgical monitoring of colon cancer with a multi-marker DNA methylation liquid biopsy. Proc Natl Acad Sci USA. 2021;118(5):985–9.
Article Google Scholar
Wan N, Weinberg D, Liu TY, Niehaus K, Ariazi EA, Delubac D, Kannan A, White B, Bailey M, Bertin M, et al. Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA. BMC Cancer. 2019;19(1):832.
Article Google Scholar
Zhang C, Ma Y. Ensemble machine learning: methods and applications. New York: Springer; 2012.
Book Google Scholar
Kwon H, Park J, Lee Y. Stacking ensemble technique for classifying breast cancer. Healthc Inform Res. 2019;25(4):283–8.
Article Google Scholar

Download references

Acknowledgements

We would like to thank the patients and family members who gave their consent on presenting the data in this study, as well as the investigators and research staff involved in this study.

Funding

This work was supported by grants from the National Natural Science Foundation of China (U1932145 to Junjie Peng), Science and Technology Commission of Shanghai Municipality (18401933402 to Junjie Peng), National Natural Science Foundation of China (82002946 to Yaqi Li) and Shanghai Sailing Program (19YF1409500 to Yaqi Li).

Author information

Xiaoji Ma, Yikuan Chen and Wanxiangfu Tang have contributed equally to this work

Authors and Affiliations

Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, 270 Dong’an Road, Xuhui, Shanghai, 200032, China
Xiaoji Ma, Yikuan Chen, Shaobo Mo, Yaqi Li, Long Zhang, Sanjun Cai, Fangqi Liu & Junjie Peng
Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China
Xiaoji Ma, Yikuan Chen, Shaobo Mo, Yaqi Li, Long Zhang, Sanjun Cai, Fangqi Liu & Junjie Peng
Geneseeq Research Institute, Nanjing Geneseeq Technology Inc, Room 1702 Building B Phase I Zhongdan Eco Life Sci Ind Park, Nanjing, 210032, Jiangsu, China
Wanxiangfu Tang, Hua Bao, Rui Liu, Shuyu Wu, Hairong Bao, Xue Wu & Yang Shao
Department of Cancer Institute, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, 200032, China
Long Zhang & Sanjun Cai

Authors

Xiaoji Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yikuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wanxiangfu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Bao
View author publications
You can also search for this author in PubMed Google Scholar
Shaobo Mo
View author publications
You can also search for this author in PubMed Google Scholar
Rui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shuyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hairong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Yaqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Long Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Sanjun Cai
View author publications
You can also search for this author in PubMed Google Scholar
Yang Shao
View author publications
You can also search for this author in PubMed Google Scholar
Fangqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Junjie Peng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JP, FL and YS conceptualized and provided guidance throughout the study. XM and YC performed the experiments, analyzed the data and extensively edited the manuscript. WT performed the computational analysis and wrote the manuscript. XM, YC and WT contributed equally to this work. SM, YL and LZ collected patient samples and documented clinical information. RL, SW and Hairong Bao performed the bioinformatics pipeline. Hua Bao and XW made significant revision to the manuscript. SC provided thoughtful inputs to the study design. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yang Shao, Fangqi Liu or Junjie Peng.

Ethics declarations

Ethics approval and consent to participate

All study protocols were approved by the ethics committee of the Fudan University Shanghai Cancer Center, Shanghai Cancer Center Institutional Review Board (SCCIRB), and in accordance with international standards of good clinical practice. Written informed consents were provided by all patients.

Consent for publication

The content of this manuscript has not been previously published and is not under consideration for publication elsewhere.

Competing interests

Wanxiangfu Tang, Hua Bao, Rui Liu, Shuyu Wu, Hairong Bao, Xue Wu and Yang Shao are employees of Nanjing Geneseeq Technology Inc., Nanjing, Jiangsu, China. The remaining authors have nothing to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary methods. Supplementary Results. Supplementary Figures.

Figure S1. Evaluation of base model using individual features. Figure S2. Evaluation of models distinguishing advCRA from early-stage CRC or healthy controls. Figure S3. Evaluation of model constructed using raw coverage WGS data. Figure S4. Evaluation of a multi-dimensional model detecting advCRA/early-stage CRC. Figure S5. Evaluation of age and gender matched groups in the test cohort. Figure S6. Evaluation of model using 10-fold cross-validation score of the training cohort. Supplementary Tables. Table S1. Performances evaluation of base models using different features. Table S2. Evaluating performances of model constructed by raw depth data in the test dataset. Table S3. Participant demographics and baseline characteristics. Table S4. Clinical information of the colorectal advanced adenoma (advCRA) and Adenocarcinoma (CRC) patients.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Ma, X., Chen, Y., Tang, W. et al. Multi-dimensional fragmentomic assay for ultrasensitive early detection of colorectal advanced adenoma and adenocarcinoma. J Hematol Oncol 14, 175 (2021). https://doi.org/10.1186/s13045-021-01189-w

Download citation

Received: 10 August 2021
Accepted: 12 October 2021
Published: 26 October 2021
DOI: https://doi.org/10.1186/s13045-021-01189-w

Multi-dimensional fragmentomic assay for ultrasensitive early detection of colorectal advanced adenoma and adenocarcinoma

Abstract

Explore related subjects

To the editor

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1: Supplementary methods. Supplementary Results. Supplementary Figures.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation