使用深神经网络从受污染的簇中进行的强大多阅读重建

论文标题

使用深神经网络从受污染的簇中进行的强大多阅读重建

Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep Neural Network for DNA Storage

论文作者

Qin, Yun, Zhu, Fei, Xi, Bo

论文摘要

作为新兴数据存储介质，DNA具有巨大的潜力。 DNA存储的原理是二进制代码流，第四纪基础和实际DNA片段之间数字信息的转换和流。这个过程不可避免地会引入错误，从而对准确的数据恢复提出了挑战。序列重建包括从错误拷贝群中推断DNA参考。现有方法中的一个常见假设是，集群中的所有链都是嘈杂的副本，源自相同的参考，从而同样贡献了重建。但是，考虑到DNA存储过程中DNA片段化和重排引起的污染序列的存在并不总是有效的。本文提出了使用DNN的强大的多阅读重建模型，该模型使用DNN稳健地溶解了具有污染的群集具有异化序列的污染群集，以及与IDS IDS ORRORS差异的读取。该方法的有效性和鲁棒性在三个下一代测序数据集上进行了验证，其中通过模拟DNA存储过程中发生的不同污染水平来进行一系列比较实验。

DNA has immense potential as an emerging data storage medium. The principle of DNA storage is the conversion and flow of digital information between binary code stream, quaternary base, and actual DNA fragments. This process will inevitably introduce errors, posing challenges to accurate data recovery. Sequence reconstruction consists of inferring the DNA reference from a cluster of erroneous copies. A common assumption in existing methods is that all the strands within a cluster are noisy copies originating from the same reference, thereby contributing equally to the reconstruction. However, this is not always valid considering the existence of contaminated sequences caused, for example, by DNA fragmentation and rearrangement during the DNA storage process.This paper proposed a robust multi-read reconstruction model using DNN, which is resilient to contaminated clusters with outlier sequences, as well as to noisy reads with IDS errors. The effectiveness and robustness of the method are validated on three next-generation sequencing datasets, where a series of comparative experiments are performed by simulating varying contamination levels that occurring during the process of DNA storage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题