New Quality Control Methods for Genome Sequencing Data

An international team from Cardio-CARE in Davos, the University Hospital Zurich, and the University Medical Center Hamburg-Eppendorf outlines crucial steps for obtaining high-quality data in large-scale genetic studies.

The project aims to explore the genetic factors contributing to cardiovascular and other diseases. The work is based on the largest whole-genome sequencing study in the German-speaking regions. It involves more than 9000 residents from Hamburg who participated in the world’s largest local cohort study, the Hamburg City Health Study (HCHS).

Samples were processed at the University Hospital Zurich using high-end Illumina technology. A challenge in such studies is the size of the raw data, which exceeds that of 1,000 laptops. Cardio-CARE stores, processes, and analyzes all data on their local high-performance computer. To reduce storage, Cardio-CARE is the first group to publish on an advanced compression technology developed by Illumina, which reduced storage footprint to less than 20% of its original size. This not only saves physical space but also makes data management more efficient.

Now that the genetic data have been validated through quality control measures, the research teams can move on to the next exciting phase: identifying associations between individuals’ DNA and disease.

The sequencing was financed by the Kühne Foundation. The full study is available in the Biometrical Journal:
Betschart, R.O., Riccio, C., Aguilera-Garcia, D., Blankenberg, S., Guo, L., Moch, H., Seidl, D., Solleder, H., Thalén, F., Thiéry, A., Twerenbold, R., Zeller, T., Zoche, M. and Ziegler, A. (2024) Biostatistical aspects of whole genome sequencing studies: preprocessing and quality control. Biometrical Journal, 66: e202300278. Doi: 10.1002/bimj.202300278