南方医科大学学报 ›› 2020, Vol. 40 ›› Issue (10): 1493-1499.doi: 10.12122/j.issn.1673-4254.2020.10.16

• • 上一篇    下一篇

针对单体型扩增区域的肿瘤易感变异关联分析

耿 彧,杨蓉蓉,张 静   

  • 出版日期:2020-10-20 发布日期:2020-10-20

An improved association analysis pipeline for tumor susceptibility variant in haplotype amplification area

  • Online:2020-10-20 Published:2020-10-20

摘要: 目的 遗传变异中的单体型扩增因具有潜在的选择优势和克隆演变敏感性,成为寻找易感癌基因的一个重要标志。本文充分考虑单体型扩增状态的影响因素,有效实现稀有变异关联分析。方法 通过等位基因变异频率估计单体型扩增状态。首先采用置换检验,基于等位基因变异频率实现候选变异位点的聚类。再应用似然聚类方法,确定隐马尔科夫随机场模型中的邻域系统。此外,引入一个威尔逊区间和错误识别率的组合过滤机制,进一步提高变异位点识别精度。最后将候选集与单体型扩增状态合并到加权虚拟位点中用于关联分析。结果 通过仿真实验,分别对不同次等位基因变异频率的型错误率比较分析,发现型错误率基本稳定在2%以内。与其他5种关联分析方法分别进行型和错误率比较分析,型与型错误率均控制在2%以内,显示出其显著优势及较好的统计能力。结论 本研究提出的针对单体型扩增区域的肿瘤易感变异关联分析方法能够较为精确的识别单体型扩增区域的肿瘤易感变异,具有良好的健壮性与稳定性,可为临床诊断提供决策支持。

关键词: 肿瘤基因组学, 疾病关联分析, 稀有变异, 单体型扩增

Abstract: Objective Haplotype amplification on germline variants is suggested to imply potential selective advantages and clonal expansion susceptibility and has become an important signature for seeking cancer susceptibility gene. Here we propose an improved association method that fully considers the haplotype amplification status. Methods The haplotype amplification status was estimated by the variant allelic frequencies. We adopted a permutation test on variant allelic frequencies to divide the candidate variants into multiple groups. A likelihood clustering method was then applied to establish the neighborhood system of the hidden Markov random field framework. A filtering pipeline was introduced into the proposed method to further refine the candidate variants, including a Wilson's interval filter and a false discovery rate controller. The final candidate set along with the haplotype amplification status was collapsed into the weighted virtual sites for association tests. Results Through simulated tests on a series of datasets, we compared the type I error rates of different minor allele frequencies, which stably fell within 2% , suggesting good robustness of the algorithm. In addition, we compared another 5 published association approaches for Type-I and Type-II error rates with the proposed method, which resulted in the error rates all within 2%, demonstrating significant advantages and a good statistical ability of the proposed method. Conclusion The proposed method can accurately identify tumor susceptibility variants in haplotype amplification area with good robustness and stability.

Key words: cancer genomics, variant association method, rare variants, haplotype amplification