With the recent acquisition of two advanced long-read sequencing platforms, the ONT PromethION and PacBio Revio, at my workplace, many researchers are now eager to dive into whole-genome sequencing for their species of interest. To help guide these efforts, I’ve compiled insights from my readings and research surveys on the essentials of whole-genome assembly. Here are the key points:

  1. Reads and Coverage
    • Aim for at least 15x accurate hifi long-read coverage per haplotype. This ensures enough depth for high-confidence assembly. 
    • Adding approximately 5x ultra-long-read (ONT) coverage per haplotype can significantly improve assembly contiguity, as longer reads help span repetitive regions
    • Adding 30x Hi-C long-range data aids scaffolding and phasing.
  2. Assemblers Easy to setup and run, short runtime
    • hifiasm,  Easy to set up and run; has a short runtime.
    • Verkko / LJA , Easy to set up and run, but errors were observed on the first attempt
    • Canu, Simple to use and successfully provided a complete bacterial genome. However, for larger genomes like a mouse, runtime issues were a challenge.
  3.  Evaluation Metrics
    • Assembly Size
    • N50
    • BUSCO
    • K-mer Based Evaluation, Merqury, or KAT
    • Alignment-Based Evaluation, QUAST