Genome-wide Melting Temperature Profiling: An E. coli Case Study

Junhui Li, Lihua Julie Zhu

2026-06-03

Introduction

This vignette demonstrates how to compute melting temperature (Tm) across an entire bacterial genome in a reproducible, end-to-end workflow using TmCalculator. We use Escherichia coli K-12 MG1655 (NCBI assembly GCF_000005845.2 / ASM584v2) as the reference organism, calculating Tm for every non-overlapping 200 bp window along the single circular chromosome.

The four steps covered here are:

  1. Build a BSgenome data package from the NCBI assembly accession.
  2. Generate non-overlapping genomic windows with make_genomiccoord().
  3. Compute Tm and GC content for all windows with tm_calculate().
  4. Visualise the genome-wide Tm profile as a circular (Circos) plot, optionally overlaid with multi-omics tracks.
  5. Test if the Tm values in the mutH peaks are significantly different from the rest of the genome with test_granges_targets().

By the end of this vignette you will have a publication-ready circular genome map showing how sequence composition—quantified through Tm—varies around the E. coli chromosome, and how it relates to independently measured genomic features such as mismatch-repair hotspots, microsatellite density, and cruciform-forming sequences.


Prerequisites

R packages

# Bioconductor core
#BiocManager::install(c(
#  "BSgenomeForge",   # forge custom BSgenome packages from NCBI
#  "BSgenome",
#  "GenomicRanges"
#))

library(TmCalculator)
library(BSgenome)
library(GenomicRanges)

Building the E. coli BSgenome data package

TmCalculator retrieves sequences from BSgenome data packages. Because an E. coli BSgenome package is not on Bioconductor, we forge one locally from the NCBI RefSeq assembly using BSgenomeForge. This step only needs to be run once; thereafter, install the resulting package and load it like any other BSgenome package.

library(BSgenomeForge)

# Download the FASTA and build the data package (takes ~1–2 min)
forgeBSgenomeDataPkgFromNCBI(
  assembly_accession = "GCF_000005845.2",
  pkg_maintainer     = "Junhui Li <ljh.biostat@gmail.com>",
  destdir            = "."        # writes BSgenome.Ecoli.NCBI.ASM584v2/ here
)

# Install the freshly forged package
install.packages(
  "./BSgenome.Ecoli.NCBI.ASM584v2",
  repos = NULL,
  type  = "source"
)

The GitHub repo ships package metadata only; sequence data must be forged once. The chunk below installs the package (if needed), forges it from NCBI when the Ecoli genome object is still missing, then loads it. Knit from the top so this chunk runs before load-genome.

Load the genome (object Ecoli; package BSgenome.Ecoli.NCBI.ASM584v2 for coordinates):

ecoli_pkg  <- "BSgenome.Ecoli.NCBI.ASM584v2"
genome_obj <- "Ecoli"

suppressPackageStartupMessages(library(ecoli_pkg, character.only = TRUE))
genome      <- base::get(genome_obj, envir = asNamespace(ecoli_pkg))
genome_name <- ecoli_pkg
chr_name    <- "U00096.3"
chr_length  <- length(genome[[chr_name]])

cat("Chromosome:", chr_name, "\n")
cat("Length:    ", format(chr_length, big.mark = ","), "bp\n")

Step 1 — Generate Genomic Windows

make_genomiccoord() tiles the chromosome into non-overlapping 200 bp bins and returns a character vector of coordinate strings accepted by coor_to_genomic_ranges(). Setting slide = window (both 200 bp) ensures the bins are contiguous and non-overlapping.

bins_gc <- make_genomiccoord(
  bsgenome     = genome,
  chromosomes  = chr_name,
  window       = 200L,
  slide        = 200L,          # non-overlapping: slide == window
  start        = 1,
  end          = chr_length,
  strand       = "+"
)

cat("Total windows:", length(bins_gc), "\n")
cat("Last window:  ", bins_gc[length(bins_gc)], "\n")

coor_to_genomic_ranges() resolves each coordinate string against the BSgenome package and returns a GRanges object carrying the full sequence and its reverse complement:

input_new <- list(
  pkg_name = genome_name,
  seq      = bins_gc
)

runtime1 <- system.time({
  gr_batch <- to_genomic_ranges_fast(input_new)
})

cat(sprintf(
  "Coordinate resolution: %.2f s (elapsed)\n",
  runtime1["elapsed"]
))

Step 2 — Compute Genome-wide Tm

We calculate Tm for every window using the nearest-neighbor (NN) method with the SantaLucia & Hicks (2004) parameter set and a standard monovalent salt concentration of 50 mM Na⁺. GC content is computed alongside Tm automatically.

runtime2 <- system.time({
  tm_ASM584v2 <- tm_calculate(
    gr_batch,
    method   = "tm_nn",
    nn_table = "DNA_NN_SantaLucia_2004",
    Na       = 50            # mM; standard PCR-like conditions
  )
})

cat(sprintf(
  "Tm calculation: %.2f s (elapsed) for %s windows\n",
  runtime2["elapsed"],
  format(length(bins_gc), big.mark = ",")
))

The combined wall-clock time for both steps on a standard desktop is typically under 10 seconds for the full E. coli genome at 200 bp resolution.

Extract the Tm and GC-content columns into a plain data frame for downstream plotting:

Tm <- as.data.frame(tm_ASM584v2$gr[, c(4, 6)])

# Quick summary
summary(Tm[, c("Tm", "GC")])

Step 3 — Visualise the Genome-wide Tm Profile

Assemble multi-track data

We overlay the Tm/GC profile with four independently measured genomic datasets bundled in the ecoli_rep_hotspots data object:

Track Source
MutL-AR MutL protein ChIP-seq peaks (mismatch-repair activity)
Microsatellites Tandem-repeat density per 200 bp bin
Cruciform Cruciform-forming sequence density per 200 bp bin
ssDNA Single-stranded DNA regions
GATC sites GATC methylation-site density per 200 bp bin
data(ecoli_rep_hotspots)

# Reference labels: replication origin (ori) and terminus (dif)
label <- data.frame(
  seqnames = genome_name,
  start    = c(3925804, 1590777),
  end      = c(3925804, 1590777),
  label    = c("ori", "dif")
)
tracks <- list(
  # --- Protein interaction ---
  list(
    type   = "rect",
    data   = ecoli_rep_hotspots$all_peaks_IP_mutH,
    col    = "black",
    bg.col = "grey",
    name   = "MutL-AR"
  ),

  # --- Sequence thermodynamics ---
  list(
    type      = "line",
    data      = Tm,
    value_col = "GC",
    name      = "GC content"
  ),
  list(
    type      = "line",
    data      = Tm,
    value_col = "Tm",
    name      = "Melting temp"
  ),

  # --- Repeat / structural features ---
  list(
    type      = "line",
    data      = ecoli_rep_hotspots$bins_rep,
    value_col = "count",
    name      = "Microsatellites"
  ),
  list(
    type      = "line",
    data      = ecoli_rep_hotspots$bins_cru,
    value_col = "count",
    name      = "Cruciform"
  ),

  # --- ssDNA regions ---
  list(
    data = ecoli_rep_hotspots$ssdna,
    name = "ssDNA"
  ),

  # --- GATC methylation sites ---
  list(
    type      = "line",
    data      = ecoli_rep_hotspots$bins_gatc,
    value_col = "count",
    name      = "GATC sites"
  )
)

Full circular genome map

plot_circos_genome() renders all tracks as concentric rings around the chromosome, with the replication origin (ori) and terminus (dif) labelled:

plot_circos_genome(
  genome_name = genome_name,
  genome_size = chr_length,
  track_list  = tracks,
  canvas.xlim = c(-1, 1),
  canvas.ylim = c(-1, 1),
  label       = label
)

Zoomed view — upper-right quadrant

To highlight a genomic region in detail, restrict canvas.xlim and canvas.ylim to a quadrant. The plot below focuses on the first quadrant (upper right), which spans roughly the first quarter of the chromosome starting from ori:

plot_circos_genome(
  genome_name = genome_name,
  genome_size = chr_length,
  track_list  = tracks,
  canvas.xlim = c(0.5, 1),
  canvas.ylim = c(0,   1)
)

Step 4 — Test if the Tm values in the mutH peaks are significantly different from the rest of the genome

Step 1 — Build the mutH peak GRanges

mutH_peaks <- GRanges(
  seqnames = ecoli_rep_hotspots$all_peaks_IP_mutH$chr,         # NC_000913.3
  ranges   = IRanges(start = ecoli_rep_hotspots$all_peaks_IP_mutH$start,
                     end   = ecoli_rep_hotspots$all_peaks_IP_mutH$end)
)

Step 2 — Align seqlevels: NC_000913.3 → U00096.3

# tm_ASM584v2 uses the BSgenome name "U00096.3"; remap the peaks to match.
seqlevels(mutH_peaks) <- "U00096.3"

Step 3 — Annotate each Tm tile with which peak it falls into

mutH_peaks$peak_id <- paste0("mutH_", seq_along(mutH_peaks))

tm_annot <- integrate_granges(
  gr_tm        = tm_ASM584v2$gr,
  gr_features  = mutH_peaks,
  strategy     = "overlap",
  feature_cols = "peak_id",
  keep_unmatched = TRUE   # keep non-overlapping tiles so they form the "non_peak" group
)

Step 4 — Add a two-level grouping column

tm_annot$in_mutH <- ifelse(is.na(tm_annot$peak_id), "non_peak", "peak")
table(tm_annot$in_mutH)

Step 5 — Wilcoxon on Tm and GC, peaks vs the rest of the genome

res <- compare_groups(
  gr           = tm_annot,
  target       = c("Tm", "GC"),
  method       = "wilcoxon",
  group        = "in_mutH",
  alternative  = "greater",   # tests "peak > non_peak"
  posthoc      = FALSE     # only 2 groups, no post-hoc needed
)
res$results   # one row per target: test, statistic, p.value
res$summary   # per-group n, mean, sd, median

Interpreting the Results

MutL-AR-associated windows have lower Tm and higher GC content than the rest of the genome (p-value < 0.001 for both Tm and GC).

MutL-AR peaks cluster near the replication terminus. The mismatch-repair protein MutL accumulates preferentially in the terminus region (around dif), where replication forks converge and mismatch density is highest. This spatial coincidence with locally elevated microsatellite density and cruciform-forming sequence is consistent with the replication stress model of MMR recruitment.

GATC methylation sites are distributed genome-wide but show local density fluctuations that partially anti-correlate with GC content, reflecting the sequence context requirements for Dam methyltransferase (5′-GATC-3′).


For human-scale analyses (non-overlapping 200 bp windows across the ~3.1 Gb haploid genome, ~15.5 M windows), we recommend running tm_calculate() in parallelly using BiocParallel and splitting input by chromosome. Memory requirements scale approximately linearly with the number of windows.


Session Information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.72.0                  
##  [3] rtracklayer_1.64.0                BiocIO_1.14.0                    
##  [5] Biostrings_2.72.1                 XVector_0.44.0                   
##  [7] GenomicRanges_1.56.2              GenomeInfoDb_1.40.1              
##  [9] IRanges_2.38.1                    S4Vectors_0.42.1                 
## [11] BiocGenerics_0.50.0               TmCalculator_1.0.5               
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3          rstudioapi_0.18.0          
##   [3] jsonlite_2.0.0              shape_1.4.6.1              
##   [5] magrittr_2.0.4              GenomicFeatures_1.56.0     
##   [7] farver_2.1.2                rmarkdown_2.30             
##   [9] GlobalOptions_0.1.2         zlibbioc_1.50.0            
##  [11] vctrs_0.7.1                 memoise_2.0.1              
##  [13] Rsamtools_2.20.0            RCurl_1.98-1.17            
##  [15] base64enc_0.1-6             htmltools_0.5.9            
##  [17] S4Arrays_1.4.1              progress_1.2.3             
##  [19] curl_6.2.3                  SparseArray_1.4.8          
##  [21] Formula_1.2-5               sass_0.4.10                
##  [23] bslib_0.10.0                htmlwidgets_1.6.4          
##  [25] plyr_1.8.9                  Gviz_1.48.0                
##  [27] httr2_1.1.2                 plotly_4.10.4              
##  [29] cachem_1.1.0                GenomicAlignments_1.40.0   
##  [31] lifecycle_1.0.5             pkgconfig_2.0.3            
##  [33] Matrix_1.7-0                R6_2.6.1                   
##  [35] fastmap_1.2.0               GenomeInfoDbData_1.2.12    
##  [37] MatrixGenerics_1.16.0       digest_0.6.39              
##  [39] colorspace_2.1-1            GGally_2.2.1               
##  [41] OrganismDbi_1.46.0          AnnotationDbi_1.66.0       
##  [43] bezier_1.1.2                regioneR_1.36.0            
##  [45] Hmisc_5.2-3                 RSQLite_2.4.0              
##  [47] labeling_0.4.3              filelock_1.0.3             
##  [49] httr_1.4.7                  abind_1.4-8                
##  [51] compiler_4.4.1              withr_3.0.2                
##  [53] bit64_4.6.0-1               htmlTable_2.4.3            
##  [55] backports_1.5.0             BiocParallel_1.38.0        
##  [57] viridis_0.6.5               DBI_1.2.3                  
##  [59] ggstats_0.9.0               biomaRt_2.60.1             
##  [61] MASS_7.3-61                 rappdirs_0.3.4             
##  [63] DelayedArray_0.30.1         rjson_0.2.23               
##  [65] tools_4.4.1                 foreign_0.8-87             
##  [67] otel_0.2.0                  karyoploteR_1.30.0         
##  [69] nnet_7.3-19                 glue_1.8.0                 
##  [71] restfulr_0.0.15             grid_4.4.1                 
##  [73] checkmate_2.3.2             ade4_1.7-23                
##  [75] reshape2_1.4.4              cluster_2.1.6              
##  [77] generics_0.1.4              seqinr_4.2-36              
##  [79] gtable_0.3.6                tidyr_1.3.1                
##  [81] ensembldb_2.28.1            data.table_1.17.4          
##  [83] hms_1.1.3                   xml2_1.5.2                 
##  [85] pillar_1.10.2               stringr_1.6.0              
##  [87] circlize_0.4.16             dplyr_1.1.4                
##  [89] BiocFileCache_2.12.0        lattice_0.22-6             
##  [91] bit_4.6.0                   deldir_2.0-4               
##  [93] biovizBase_1.52.0           RBGL_1.80.0                
##  [95] tidyselect_1.2.1            knitr_1.51                 
##  [97] gridExtra_2.3               ggbio_1.52.0               
##  [99] ProtGenerics_1.36.0         SummarizedExperiment_1.34.0
## [101] xfun_0.56                   Biobase_2.64.0             
## [103] matrixStats_1.5.0           stringi_1.8.7              
## [105] UCSC.utils_1.0.0            lazyeval_0.2.2             
## [107] yaml_2.3.12                 evaluate_1.0.5             
## [109] codetools_0.2-20            interp_1.1-6               
## [111] tibble_3.2.1                graph_1.82.0               
## [113] BiocManager_1.30.27         cli_3.6.5                  
## [115] rpart_4.1.23                jquerylib_0.1.4            
## [117] dichromat_2.0-0.1           Rcpp_1.1.1                 
## [119] dbplyr_2.5.0                png_0.1-8                  
## [121] XML_3.99-0.18               parallel_4.4.1             
## [123] ggplot2_3.5.2               blob_1.2.4                 
## [125] prettyunits_1.2.0           latticeExtra_0.6-30        
## [127] jpeg_0.1-11                 AnnotationFilter_1.28.0    
## [129] bitops_1.0-9                txdbmaker_1.0.1            
## [131] viridisLite_0.4.2           VariantAnnotation_1.50.0   
## [133] scales_1.4.0                purrr_1.2.1                
## [135] crayon_1.5.3                bamsignals_1.36.0          
## [137] rlang_1.1.7                 KEGGREST_1.44.1