--- title: "ICDS User Guide" author: "Junwei Han" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{ICDS User Guide} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r style, echo=FALSE, results="asis", message=FALSE} knitr::opts_chunk$set(tidy = FALSE, warning = FALSE, message = FALSE) ``` ## 1 Introduce This vignette illustrates how to easily use the ICDS package. This package can identifying cancer risk subpathways which can explain the disturbed biological functions in more detail and accurately. + This package provides the `getExpp`,`getMethp`,`getCnvp`function to calculate p-values or corrected p-values for each gene. + This package provides the `coverp2zscore`,`combinep_two`,`combinep_three` function to Convert p-values or corrected p-values to z-scores. + This package provides the `FindSubPath` function to search for interested subpathways in each entire pathway. + This package provides the `opt_subpath` function to Optimize interested subpathways. + This package provides the `Permutation` function to to calculate statistical significance for these interested subpathways . ## 2 Example: Obtain p-values or corrected p-values for each gene We can use function getExampleData to return example data and environment variables, such as the data of exp_data. ```{r} library(ICDS) # obtain the expression profile data exp_data<-GetExampleData("exp_data") #view first six rows and six colmns of data exp_data[1:6, 1:6] ``` ```{r eval=TRUE} #obtain the labels of the samples of the expression profile, the label vector is a vector of 0/1s, # 0 represents the case sample and 1 represents the control sample label1<-GetExampleData("label1") #view first ten label label1[1:10] ``` we used the Student's t-test to calculate the p-value for expression level and methylation level of each gene in the tumor and normal samples.label,0 represents normal samples;1 represents cancer samples. ```{r} #calculate p-values or corrected p-values for each gene exp.p<-getExpp(exp_data,label = label1,p.adjust = FALSE) label2<-GetExampleData("label2") meth_data<-GetExampleData("meth_data") meth.p<-getMethp(meth_data,label = label2,p.adjust = FALSE) exp.p[1:10,] ``` For copy number data, we can calculate p-value through the following way: ```{r} #obtain Copy number variation data cnv_data<-GetExampleData("cnv_data") #obtion amplified genes amp_gene<-GetExampleData("amp_gene") #obtion deleted genes del_gene<-GetExampleData(("del_gene")) ``` ```{R} #calculate p-values or corrected p-values for each gene cnv.p<-getCnvp(exp_data,cnv_data,amp_gene,del_gene,p.adjust=FALSE,method="fdr") cnv.p[1:10,] ``` ## 3 Example: Combine P-values of different kinds of data With the above data, we constructed an integrative gene z-score (Z) based three datasets. + Firstly,the gene score calculated through combined p-values of Fisher’s Inverse chi-square tests. This method computes a combined statistic S from the p-values of the difference coefficient obtained from three individual datasets .Usually,the statistic S follows a chi-square distribution with 2k degrees,we can calculate the null hypothesis p-value of the statistic S. + Secondly,we convert the p-value to z-score according to Gaussian distribution , which is taken as the gene z-score(Z) . ```{r} #obtain the p-values of expression profile data,methylation profile data and Copy number variation data exp.p<-GetExampleData("exp.p") meth.p<-GetExampleData("meth.p") cnv.p<-GetExampleData("cnv.p") ``` ```{r} #calculate z-scores for p-values of each kind of data zexp<-coverp2zscore(exp.p) zmeth<-coverp2zscore(meth.p) zcnv<-coverp2zscore(cnv.p) #combine two kinds of p-values,then,calculate z-score for them zz<-combinep_two(exp.p,meth.p) #combine three kinds of p-values,then,calculate z-score for them zzz<-combinep_three(exp.p,meth.p,cnv.p) zzz[1:6,] ``` ## 4 Example: Obtain subpathways For a priori KEGG path, we perform a greedy algorithm to search for interested subpathways.We provide two statistical test methods of the subpathway, which one is whole gene-based perturbation, and the other is the local gene perturbation in a particular pathway. ```{r} #obtain z-score of each gene zzz<-GetExampleData("zzz") zzz[1:10,] ``` ```{r eval= FALSE} require(graphite) zz<-GetExampleData("zzz") #subpathdata<-FindSubPath(zz) #only show ``` Optimize interested subpathways.If the number of genes shared by the two pathways accounted for more than the Overlap ratio of each pathway genes,than combine two pathways. ```{r} subpathdata<-GetExampleData("subpathdata") keysubpathways<-opt_subpath(subpathdata,zz,overlap=0.6) head(keysubpathways) ``` the perturbation test was used to calculate statistical significance for these interested subpathways. ```{r} keysubpathways<-Permutation(keysubpathways,zz,nperm1=100,method1=TRUE,nperm2=100,method2=FALSE) head(keysubpathways) ``` ## 4 Example: plot a network graph when user input a list of gene ```{r} require(graphite) require(org.Hs.eg.db) subpID<-unlist(strsplit("ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2","/")) pathway.name="Glycolysis / Gluconeogenesis" zzz<- GetExampleData("zzz") PlotSubpathway(subpID=subpID,pathway.name=pathway.name,zz=zzz) ```