If you use a samtools view command to show the header, you should get the following output: ```bash samtools view -H CRR1048698.bam ``` ```text @HD VN:1.6 SO:coordinate @PG ID:basecaller PN:dorado VN:0.3.2+d8660a3 CL:dorado basecaller /data1/baixin/softwares/dorado-0.3.2-linux-x64/model/dna_r10.4.1_e8.2_400bps_hac@v4.2.0 /data3/baixin/arabidopsis2/convertPod5/ --modified-bases-models /data1/baixin/softwares/dorado-0.3.2-linux-x64/model/dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG@v2 --reference /data1/baixin/ref/GCF_000001735.4_TAIR10.1_genomic.mmi --emit-moves @PG ID:samtools PN:samtools PP:basecaller VN:1.10 CL:samtools sort -@ 20 arabidopsis.all_pass.emit-moves.bam @PG ID:samtools.1 PN:samtools PP:samtools VN:1.13 CL:samtools view -H CRR1048698.bam @RG ID:43eb2b12dbad38163be0a2df7202d0c79a3f3e43_dna_r10.4.1_e8.2_400bps_hac@v4.2.0 PU:PAO17425 PM:PC48B093 DT:2023-07-17T09:06:50.520+00:00 PL:ONT DS:basecall_model=dna_r10.4.1_e8.2_400bps_hac@v4.2.0 runid=43eb2b12dbad38163be0a2df7202d0c79a3f3e43 LB:20230717-NPL230963-P7-PAO17425-fast SM:20230717-NPL230963-P7-PAO17425-fast @RG ID:8bb51cf5d932a1e4618444d2819c35139d51f93a_dna_r10.4.1_e8.2_400bps_hac@v4.2.0 PU:PAO17425 PM:PC48B093 DT:2023-07-19T08:14:32.696+00:00 PL:ONT DS:basecall_model=dna_r10.4.1_e8.2_400bps_hac@v4.2.0 runid=8bb51cf5d932a1e4618444d2819c35139d51f93a LB:20230717-NPL230963-P7-PAO17425-fast SM:20230717-NPL230963-P7-PAO17425-fast @SQ SN:NC_003070.9 LN:30427671 @SQ SN:NC_003071.7 LN:19698289 @SQ SN:NC_003074.8 LN:23459830 @SQ SN:NC_003075.7 LN:18585056 @SQ SN:NC_003076.8 LN:26975502 @SQ SN:NC_037304.1 LN:367808 @SQ SN:NC_000932.1 LN:154478 ``` The output tells us that dorado both basecalled and modcalled this data. We also see that the reference genome is `GCF_000001735.4_TAIR10.1_genomic.mmi`, a quick google would take us to a site like `https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001735.4/` which would tell us this is Arabidopsis. There are about 1.6 million reads in the file. ```bash samtools view -c CRR1048698.bam ``` ```text 1596813 ``` About 1.1 million of them are primary, which means there are about half a million secondary/supplementary reads. ```bash samtools view -c --exclude-flags SECONDARY,SUPPLEMENTARY CRR1048698.bam ``` ```text 1079366 ``` The types of modifications in the file are C+m and C+h, which are 5mC methylation and 5hmC hydroxymethylation. ```bash nanalogue peek CRR1048698.bam ``` ```text contigs_and_lengths: NC_003070.9 30427671 NC_003071.7 19698289 NC_003074.8 23459830 NC_003075.7 18585056 NC_003076.8 26975502 NC_037304.1 367808 NC_000932.1 154478 modifications: C+h C+m ``` You can identify a few highly modified reads using solutions from previous exercises such as the 'Most modified read exercise' from the session where we used dorado to call methylation. You can visualize these using methods we learnt from our visualization sessions. You can download and explore BAM files from other species such as Rice from this study at this link https://ngdc.cncb.ac.cn/gsa/browse/CRA014803.