--- title: "Clustering of Hi-C contact maps" author: "Shubham Chaturvedi, Pierre Neuvial, Nathalie Vialaneix" date: "`r Sys.Date()`" output: html_document: toc: yes vignette: > %\VignetteIndexEntry{Clustering of Hi-C contact maps} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r skipNoHITC} # IMPORTANT: this vignette can not be created if HiTC is not installed if (!require("HiTC", quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) } ``` # Introduction Hi-C is a sequencing-based molecular assay designed to measure intra and inter-chromosomal interactions between the DNA molecule. In particular, the identification of Topologically-Associated Domains (TADs), that is, of regions of the genome in which physical interactions are frequent, provides insight into the three-dimensional organization of a genome [2]. Hi-C data are in the form of two-dimensional *contact maps*, *i.e.*, matrices whose $i,j$ entry quantifies the intensity of the physical interaction between two genome regions $i$ and $j$ at the DNA level. In this vignette, we demonstrate the use of `adjclust::hicClust` to perform adjacency-constrained hierarchical agglomerative clustering (HAC) of Hi-C contact maps. The output of this function is a dendrogram, which can be cut to identify TADs. The algorithm used for adjacency-constrained (HAC) is described in [3,4]. ```{r loadLib, message=FALSE} library("adjclust") ``` # Loading and displaying a sample Hi-C contact map The data set `hic_imr90_40_XX` is an object of class `HTCexp` which has been obtained from the `HiTC` package [4]. It is a contact map corresponding to the first 500 x 500 bins on chromosome X vs chromosome X. ```{r loadData} load(system.file("extdata", "hic_imr90_40_XX.rda", package = "adjclust")) ``` The script used to create this map can be found by executing the following command: ```{r create-data-script} system.file("system/create_hic_chrXchrX.R", package="adjclust") ``` Now we have a look at the data. ```{r mapHiC, message=FALSE} HiTC::mapC(hic_imr90_40_XX) ``` # Using `hicClust` `hicClust` operates directly on objects of class `HTCexp` ```{r hicClust-HTCexp, message=FALSE} fit <- hicClust(hic_imr90_40_XX) ``` It is also possible to work on binned data. Below we choose a bin size of $5 \times 10^5$: ```{r binning, message=FALSE} binned <- HiTC::binningC(hic_imr90_40_XX, binsize = 1e5) fitB <- hicClust(binned) fitB ``` ```{r plotBinned, message=FALSE} HiTC::mapC(binned) ``` The output is of class `chac`. In particular, it can be plotted as a dendrogram silently relying on the function `plot.dendrogram`: ```{r dendro} plot(fitB, mode = "corrected") ``` Moreover, the output contains an element named `merge` which describes the successive merges of the clustering, and an element `gains` which gives the improvement in the criterion optimized by the clustering at each successive merge. ```{r objectDesc} head(cbind(fitB$merge, fitB$gains)) ``` # Other types of input Contacts maps can also be stored as objects of class `Matrix::dsCMatrix`, or as plain text files. These types of input are also accepted as first argument to `hicClust`. # References [1] Ambroise C., Dehman A., Neuvial P., Rigaill G., and Vialaneix N. (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. *Algorithms for Molecular Biology*, **14**, 22. [2] Dixon J.R., *et al* (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. *Nature*, **485**(7398), 376. [3] Randriamihamison N., Vialaneix N., and Neuvial P. (2021). Applicability and interpretability of Ward's hierarchical agglomerative clustering with or without contiguity constraints. *Journal of Classification*, **38**, 363–389. [4] Servant N., *et al* (2012). HiTC: Exploration of High-Throughput 'C' experiments. *Bioinformatics*, **28**(21), 2843-2844. # Session information ```{r session} sessionInfo() ```