Exploring an unknown BAM file II (BAM files)

We perform some explorations on the BAM file obtainable by the command wget https://download.cncb.ac.cn/gsa2/CRA014803/CRR1048698/CRR1048698.bam. This exercise is similar to a previous exercise. The BAM file is from the study Chen, HX., Liu, ZD., Bai, X. et al. Accurate cross-species 5mC detection for Oxford Nanopore sequencing in plants with DeepPlant. Nat Commun 16, 3227 (2025). The study is available at this link. The study is open access and freely available.

Firstly, index the file using samtools index.

Then, answer the following questions please.

  1. How was the modification calling performed?

  2. Which organism was sequenced here? What was the reference genome? Hint: You may need to extract data from the BAM file and then use google.

Tip: Any program that creates the BAM file or alters it in some way will usually add a line to the BAM file header. You can use samtools view -H $mod_bam (substitute $mod_bam suitably) to view the header. If the header is many lines, you can pipe the output to head or tail or shuf to view a few lines e.g. samtools view -H $mod_bam | head -n 20.

  1. How many reads are in the BAM file?

  2. How many secondary/supplementary reads are in the BAM file?

  3. What are the modifications in the BAM file?

  4. Can you extract a few highly modified reads and visualize their patterns?

Expected outputs for Exploring an unknown BAM file II: 1