Ans: chrVI:190000-200000 Step 1: Isolate reads with modifications. Use our script to form a mod BAM file with the XC tag containg read modifcation counts. Then, we isolate the modified strands. We use the criterion that on average at least 5 thymidines per every 100 thymidines must be modified. So the modification rate must be at least 5% when we count only thymidines, or 1.25% when we count all four canonical DNA bases. Any other reasonable criterion is ok. ```bash input_mod_bam=~/nanomod_course_data/yeast/subset_2.sorted.bam thresholded_mod_bam= # fill suitably modified_reads_mod_bam= # fill suitably modified_reads_mod_bam_adjust_tag= # fill suitably # perform thresholding modkit call-mods --no-filtering $input_mod_bam $thresholded_mod_bam cd ~/nanomod_course_scripts/nanopore-mod-course/code # count number of modifications per read and select # those where average modification rate > 1.25% as explained # above samtools view -h $thresholded_mod_bam | awk -f count_mods_per_read.awk |\ samtools view -b -e '[XC]/rlen>0.0125' -o $modified_reads_mod_bam ``` Step 2: One can subset this to retain reads above a read length or subsample it to include only a fraction e.g.: `samtools view -s 0.2 -e 'rlen>40000' -o $op_bam $modified_reads_mod_bam`. This is an optional step. We are not doing this as a part of this solution. Step 3: The command below produces statistics of the region chrVI:150000-160000. Note down the statistics in the file `probabilities.txt` in the directory ./histogram_exercise and repeat this command for the other regions. ```bash # adjust the tag of the modified-read mod bam file modkit adjust-mods --convert T B $modified_reads_mod_bam \ $modified_reads_mod_bam_adjust_tag samtools index $modified_reads_mod_bam_adjust_tag region_file= # fill suitably echo -e "chrVI\t150000\t160000\tA\t1000\t." > $region_file; # cd to a suitable directory which can hold # the histogram_exercise directory modkit sample-probs $modified_reads_mod_bam_adjust_tag -o ./histogram_exercise \ --hist --buckets 2 --include-bed $region_file --force # --force above just overwrites files # in the ./histogram_exercise directory ```