This vignette demonstrates how to compute melting temperature (Tm) across an entire bacterial genome in a reproducible, end-to-end workflow using TmCalculator. We use Escherichia coli K-12 MG1655 (NCBI assembly GCF_000005845.2 / ASM584v2) as the reference organism, calculating Tm for every non-overlapping 200 bp window along the single circular chromosome.
The four steps covered here are:
make_genomiccoord().tm_calculate().test_granges_targets().By the end of this vignette you will have a publication-ready circular genome map showing how sequence composition—quantified through Tm—varies around the E. coli chromosome, and how it relates to independently measured genomic features such as mismatch-repair hotspots, microsatellite density, and cruciform-forming sequences.
TmCalculator retrieves sequences from BSgenome data packages. Because an E. coli BSgenome package is not on Bioconductor, we forge one locally from the NCBI RefSeq assembly using BSgenomeForge. This step only needs to be run once; thereafter, install the resulting package and load it like any other BSgenome package.
library(BSgenomeForge)
# Download the FASTA and build the data package (takes ~1–2 min)
forgeBSgenomeDataPkgFromNCBI(
assembly_accession = "GCF_000005845.2",
pkg_maintainer = "Junhui Li <ljh.biostat@gmail.com>",
destdir = "." # writes BSgenome.Ecoli.NCBI.ASM584v2/ here
)
# Install the freshly forged package
install.packages(
"./BSgenome.Ecoli.NCBI.ASM584v2",
repos = NULL,
type = "source"
)The GitHub repo ships package metadata only; sequence data must be
forged once. The chunk below installs the package (if needed), forges it
from NCBI when the Ecoli genome object is still missing,
then loads it. Knit from the top so this chunk runs
before load-genome.
Load the genome (object Ecoli; package
BSgenome.Ecoli.NCBI.ASM584v2 for coordinates):
ecoli_pkg <- "BSgenome.Ecoli.NCBI.ASM584v2"
genome_obj <- "Ecoli"
suppressPackageStartupMessages(library(ecoli_pkg, character.only = TRUE))
genome <- base::get(genome_obj, envir = asNamespace(ecoli_pkg))
genome_name <- ecoli_pkg
chr_name <- "U00096.3"
chr_length <- length(genome[[chr_name]])
cat("Chromosome:", chr_name, "\n")
cat("Length: ", format(chr_length, big.mark = ","), "bp\n")make_genomiccoord() tiles the chromosome into
non-overlapping 200 bp bins and returns a character vector of coordinate
strings accepted by coor_to_genomic_ranges(). Setting
slide = window (both 200 bp) ensures the bins are
contiguous and non-overlapping.
bins_gc <- make_genomiccoord(
bsgenome = genome,
chromosomes = chr_name,
window = 200L,
slide = 200L, # non-overlapping: slide == window
start = 1,
end = chr_length,
strand = "+"
)
cat("Total windows:", length(bins_gc), "\n")
cat("Last window: ", bins_gc[length(bins_gc)], "\n")coor_to_genomic_ranges() resolves each coordinate string
against the BSgenome package and returns a GRanges object
carrying the full sequence and its reverse complement:
input_new <- list(
pkg_name = genome_name,
seq = bins_gc
)
runtime1 <- system.time({
gr_batch <- to_genomic_ranges_fast(input_new)
})
cat(sprintf(
"Coordinate resolution: %.2f s (elapsed)\n",
runtime1["elapsed"]
))We calculate Tm for every window using the nearest-neighbor (NN) method with the SantaLucia & Hicks (2004) parameter set and a standard monovalent salt concentration of 50 mM Na⁺. GC content is computed alongside Tm automatically.
runtime2 <- system.time({
tm_ASM584v2 <- tm_calculate(
gr_batch,
method = "tm_nn",
nn_table = "DNA_NN_SantaLucia_2004",
Na = 50 # mM; standard PCR-like conditions
)
})
cat(sprintf(
"Tm calculation: %.2f s (elapsed) for %s windows\n",
runtime2["elapsed"],
format(length(bins_gc), big.mark = ",")
))The combined wall-clock time for both steps on a standard desktop is typically under 10 seconds for the full E. coli genome at 200 bp resolution.
Extract the Tm and GC-content columns into a plain data frame for downstream plotting:
We overlay the Tm/GC profile with four independently measured genomic
datasets bundled in the ecoli_rep_hotspots data object:
| Track | Source |
|---|---|
| MutL-AR | MutL protein ChIP-seq peaks (mismatch-repair activity) |
| Microsatellites | Tandem-repeat density per 200 bp bin |
| Cruciform | Cruciform-forming sequence density per 200 bp bin |
| ssDNA | Single-stranded DNA regions |
| GATC sites | GATC methylation-site density per 200 bp bin |
data(ecoli_rep_hotspots)
# Reference labels: replication origin (ori) and terminus (dif)
label <- data.frame(
seqnames = genome_name,
start = c(3925804, 1590777),
end = c(3925804, 1590777),
label = c("ori", "dif")
)tracks <- list(
# --- Protein interaction ---
list(
type = "rect",
data = ecoli_rep_hotspots$all_peaks_IP_mutH,
col = "black",
bg.col = "grey",
name = "MutL-AR"
),
# --- Sequence thermodynamics ---
list(
type = "line",
data = Tm,
value_col = "GC",
name = "GC content"
),
list(
type = "line",
data = Tm,
value_col = "Tm",
name = "Melting temp"
),
# --- Repeat / structural features ---
list(
type = "line",
data = ecoli_rep_hotspots$bins_rep,
value_col = "count",
name = "Microsatellites"
),
list(
type = "line",
data = ecoli_rep_hotspots$bins_cru,
value_col = "count",
name = "Cruciform"
),
# --- ssDNA regions ---
list(
data = ecoli_rep_hotspots$ssdna,
name = "ssDNA"
),
# --- GATC methylation sites ---
list(
type = "line",
data = ecoli_rep_hotspots$bins_gatc,
value_col = "count",
name = "GATC sites"
)
)plot_circos_genome() renders all tracks as concentric
rings around the chromosome, with the replication origin (ori) and
terminus (dif) labelled:
To highlight a genomic region in detail, restrict
canvas.xlim and canvas.ylim to a quadrant. The
plot below focuses on the first quadrant (upper right), which spans
roughly the first quarter of the chromosome starting from ori:
res <- compare_groups(
gr = tm_annot,
target = c("Tm", "GC"),
method = "wilcoxon",
group = "in_mutH",
alternative = "greater", # tests "peak > non_peak"
posthoc = FALSE # only 2 groups, no post-hoc needed
)
res$results # one row per target: test, statistic, p.value
res$summary # per-group n, mean, sd, medianMutL-AR-associated windows have lower Tm and higher GC content than the rest of the genome (p-value < 0.001 for both Tm and GC).
MutL-AR peaks cluster near the replication terminus. The mismatch-repair protein MutL accumulates preferentially in the terminus region (around dif), where replication forks converge and mismatch density is highest. This spatial coincidence with locally elevated microsatellite density and cruciform-forming sequence is consistent with the replication stress model of MMR recruitment.
GATC methylation sites are distributed genome-wide but show local density fluctuations that partially anti-correlate with GC content, reflecting the sequence context requirements for Dam methyltransferase (5′-GATC-3′).
For human-scale analyses (non-overlapping 200 bp windows across the
~3.1 Gb haploid genome, ~15.5 M windows), we recommend running
tm_calculate() in parallelly using
BiocParallel and splitting input by chromosome. Memory
requirements scale approximately linearly with the number of
windows.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.72.0
## [3] rtracklayer_1.64.0 BiocIO_1.14.0
## [5] Biostrings_2.72.1 XVector_0.44.0
## [7] GenomicRanges_1.56.2 GenomeInfoDb_1.40.1
## [9] IRanges_2.38.1 S4Vectors_0.42.1
## [11] BiocGenerics_0.50.0 TmCalculator_1.0.5
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.18.0
## [3] jsonlite_2.0.0 shape_1.4.6.1
## [5] magrittr_2.0.4 GenomicFeatures_1.56.0
## [7] farver_2.1.2 rmarkdown_2.30
## [9] GlobalOptions_0.1.2 zlibbioc_1.50.0
## [11] vctrs_0.7.1 memoise_2.0.1
## [13] Rsamtools_2.20.0 RCurl_1.98-1.17
## [15] base64enc_0.1-6 htmltools_0.5.9
## [17] S4Arrays_1.4.1 progress_1.2.3
## [19] curl_6.2.3 SparseArray_1.4.8
## [21] Formula_1.2-5 sass_0.4.10
## [23] bslib_0.10.0 htmlwidgets_1.6.4
## [25] plyr_1.8.9 Gviz_1.48.0
## [27] httr2_1.1.2 plotly_4.10.4
## [29] cachem_1.1.0 GenomicAlignments_1.40.0
## [31] lifecycle_1.0.5 pkgconfig_2.0.3
## [33] Matrix_1.7-0 R6_2.6.1
## [35] fastmap_1.2.0 GenomeInfoDbData_1.2.12
## [37] MatrixGenerics_1.16.0 digest_0.6.39
## [39] colorspace_2.1-1 GGally_2.2.1
## [41] OrganismDbi_1.46.0 AnnotationDbi_1.66.0
## [43] bezier_1.1.2 regioneR_1.36.0
## [45] Hmisc_5.2-3 RSQLite_2.4.0
## [47] labeling_0.4.3 filelock_1.0.3
## [49] httr_1.4.7 abind_1.4-8
## [51] compiler_4.4.1 withr_3.0.2
## [53] bit64_4.6.0-1 htmlTable_2.4.3
## [55] backports_1.5.0 BiocParallel_1.38.0
## [57] viridis_0.6.5 DBI_1.2.3
## [59] ggstats_0.9.0 biomaRt_2.60.1
## [61] MASS_7.3-61 rappdirs_0.3.4
## [63] DelayedArray_0.30.1 rjson_0.2.23
## [65] tools_4.4.1 foreign_0.8-87
## [67] otel_0.2.0 karyoploteR_1.30.0
## [69] nnet_7.3-19 glue_1.8.0
## [71] restfulr_0.0.15 grid_4.4.1
## [73] checkmate_2.3.2 ade4_1.7-23
## [75] reshape2_1.4.4 cluster_2.1.6
## [77] generics_0.1.4 seqinr_4.2-36
## [79] gtable_0.3.6 tidyr_1.3.1
## [81] ensembldb_2.28.1 data.table_1.17.4
## [83] hms_1.1.3 xml2_1.5.2
## [85] pillar_1.10.2 stringr_1.6.0
## [87] circlize_0.4.16 dplyr_1.1.4
## [89] BiocFileCache_2.12.0 lattice_0.22-6
## [91] bit_4.6.0 deldir_2.0-4
## [93] biovizBase_1.52.0 RBGL_1.80.0
## [95] tidyselect_1.2.1 knitr_1.51
## [97] gridExtra_2.3 ggbio_1.52.0
## [99] ProtGenerics_1.36.0 SummarizedExperiment_1.34.0
## [101] xfun_0.56 Biobase_2.64.0
## [103] matrixStats_1.5.0 stringi_1.8.7
## [105] UCSC.utils_1.0.0 lazyeval_0.2.2
## [107] yaml_2.3.12 evaluate_1.0.5
## [109] codetools_0.2-20 interp_1.1-6
## [111] tibble_3.2.1 graph_1.82.0
## [113] BiocManager_1.30.27 cli_3.6.5
## [115] rpart_4.1.23 jquerylib_0.1.4
## [117] dichromat_2.0-0.1 Rcpp_1.1.1
## [119] dbplyr_2.5.0 png_0.1-8
## [121] XML_3.99-0.18 parallel_4.4.1
## [123] ggplot2_3.5.2 blob_1.2.4
## [125] prettyunits_1.2.0 latticeExtra_0.6-30
## [127] jpeg_0.1-11 AnnotationFilter_1.28.0
## [129] bitops_1.0-9 txdbmaker_1.0.1
## [131] viridisLite_0.4.2 VariantAnnotation_1.50.0
## [133] scales_1.4.0 purrr_1.2.1
## [135] crayon_1.5.3 bamsignals_1.36.0
## [137] rlang_1.1.7 KEGGREST_1.44.1