If you use a samtools view command to show the header, you should get the following output:

```bash
samtools view -H CRR1048698.bam
```

```text
@HD     VN:1.6  SO:coordinate
@PG     ID:basecaller   PN:dorado       VN:0.3.2+d8660a3        CL:dorado basecaller /data1/baixin/softwares/dorado-0.3.2-linux-x64/model/dna_r10.4.1_e8.2_400bps_hac@v4.2.0 /data3/baixin/arabidopsis2/convertPod5/ --modified-bases-models /data1/baixin/softwares/dorado-0.3.2-linux-x64/model/dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG@v2 --reference /data1/baixin/ref/GCF_000001735.4_TAIR10.1_genomic.mmi --emit-moves
@PG     ID:samtools     PN:samtools     PP:basecaller   VN:1.10 CL:samtools sort -@ 20 arabidopsis.all_pass.emit-moves.bam
@PG     ID:samtools.1   PN:samtools     PP:samtools     VN:1.13 CL:samtools view -H CRR1048698.bam
@RG     ID:43eb2b12dbad38163be0a2df7202d0c79a3f3e43_dna_r10.4.1_e8.2_400bps_hac@v4.2.0  PU:PAO17425     PM:PC48B093     DT:2023-07-17T09:06:50.520+00:00        PL:ONT  DS:basecall_model=dna_r10.4.1_e8.2_400bps_hac@v4.2.0 runid=43eb2b12dbad38163be0a2df7202d0c79a3f3e43 LB:20230717-NPL230963-P7-PAO17425-fast  SM:20230717-NPL230963-P7-PAO17425-fast
@RG     ID:8bb51cf5d932a1e4618444d2819c35139d51f93a_dna_r10.4.1_e8.2_400bps_hac@v4.2.0  PU:PAO17425     PM:PC48B093     DT:2023-07-19T08:14:32.696+00:00        PL:ONT  DS:basecall_model=dna_r10.4.1_e8.2_400bps_hac@v4.2.0 runid=8bb51cf5d932a1e4618444d2819c35139d51f93a LB:20230717-NPL230963-P7-PAO17425-fast  SM:20230717-NPL230963-P7-PAO17425-fast
@SQ     SN:NC_003070.9  LN:30427671
@SQ     SN:NC_003071.7  LN:19698289
@SQ     SN:NC_003074.8  LN:23459830
@SQ     SN:NC_003075.7  LN:18585056
@SQ     SN:NC_003076.8  LN:26975502
@SQ     SN:NC_037304.1  LN:367808
@SQ     SN:NC_000932.1  LN:154478
```

The output tells us that dorado both basecalled and modcalled this data.
We also see that the reference genome is `GCF_000001735.4_TAIR10.1_genomic.mmi`, a quick google
would take us to a site like `https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001735.4/` which would tell us this is Arabidopsis.

There are about 1.6 million reads in the file.

```bash
samtools view -c CRR1048698.bam
```

```text
1596813
```

About 1.1 million of them are primary, which means there are about half a million secondary/supplementary reads.


```bash
samtools view -c --exclude-flags SECONDARY,SUPPLEMENTARY  CRR1048698.bam
```

```text
1079366
```

The types of modifications in the file are C+m and C+h, which
are 5mC methylation and 5hmC hydroxymethylation.

```bash
nanalogue peek CRR1048698.bam
```

```text
contigs_and_lengths:
NC_003070.9     30427671
NC_003071.7     19698289
NC_003074.8     23459830
NC_003075.7     18585056
NC_003076.8     26975502
NC_037304.1     367808
NC_000932.1     154478

modifications:
C+h
C+m
```

You can identify a few highly modified reads using solutions from previous exercises
such as the 'Most modified read exercise' from the session where we used dorado to
call methylation. You can visualize these using methods we learnt from our visualization
sessions.

You can download and explore BAM files from other species such as Rice from this study
at this link https://ngdc.cncb.ac.cn/gsa/browse/CRA014803.