---
title: "ICDS User Guide"
author: "Junwei Han"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{ICDS User Guide} 
  %\VignetteEncoding{UTF-8} 
  %\VignetteEngine{knitr::rmarkdown}
---
```{r style, echo=FALSE, results="asis", message=FALSE}
knitr::opts_chunk$set(tidy = FALSE,
                      warning = FALSE,
                      message = FALSE)
```

## 1 Introduce
This vignette illustrates how to easily use the ICDS package. This package can identifying cancer risk subpathways which can explain the disturbed biological functions in more detail and accurately.

+  This package provides the
`getExpp`,`getMethp`,`getCnvp`function to calculate p-values or corrected p-values for each gene.

+   This package provides the `coverp2zscore`,`combinep_two`,`combinep_three` function to Convert p-values or corrected p-values to z-scores.

+  This package provides the `FindSubPath` function to search for interested subpathways in each entire pathway. 

+  This package provides the `opt_subpath` function to Optimize interested subpathways. 

+  This package provides the `Permutation` function to to calculate statistical significance for these interested subpathways
.

## 2 Example: Obtain p-values or corrected p-values for each gene
We can use function getExampleData to return example data and environment variables, such as the
data of exp_data.

```{r}
library(ICDS)
# obtain the expression profile data
exp_data<-GetExampleData("exp_data")
#view first six rows and six colmns of data
exp_data[1:6, 1:6]
```


```{r eval=TRUE}
#obtain the labels of the samples of the expression profile, the label vector is a vector of 0/1s,
# 0 represents the case sample and 1 represents the control sample
label1<-GetExampleData("label1")
#view first ten label
label1[1:10]
```
we used the Student's t-test to calculate the p-value for expression level and methylation level of each gene in the tumor and normal samples.label,0 represents normal samples;1 represents cancer samples.

```{r}
#calculate p-values or corrected p-values for each gene

exp.p<-getExpp(exp_data,label = label1,p.adjust = FALSE)
label2<-GetExampleData("label2")
meth_data<-GetExampleData("meth_data")
meth.p<-getMethp(meth_data,label = label2,p.adjust = FALSE)
exp.p[1:10,]

```

For copy number data, we can calculate p-value through the following way:
```{r}
#obtain Copy number variation data
cnv_data<-GetExampleData("cnv_data")
#obtion amplified genes
amp_gene<-GetExampleData("amp_gene")
#obtion deleted genes
del_gene<-GetExampleData(("del_gene"))

```

```{R}
#calculate p-values or corrected p-values for each gene
cnv.p<-getCnvp(exp_data,cnv_data,amp_gene,del_gene,p.adjust=FALSE,method="fdr")
cnv.p[1:10,]
```

## 3 Example: Combine P-values of different kinds of data
With the above data, we constructed an integrative gene z-score (Z) based three datasets.

+ Firstly,the gene score calculated through combined p-values of Fisher’s Inverse chi-square tests. This method computes a combined statistic S from the p-values of the difference coefficient obtained from three individual datasets .Usually,the statistic S follows a chi-square distribution with 2k degrees,we can calculate the null hypothesis p-value of the statistic S.

+ Secondly,we convert the p-value to z-score according to Gaussian distribution , which is taken as the gene z-score(Z) .

```{r}
#obtain the p-values of expression profile data,methylation profile data and Copy number variation data
exp.p<-GetExampleData("exp.p")
meth.p<-GetExampleData("meth.p")
cnv.p<-GetExampleData("cnv.p")

```


```{r}

#calculate z-scores for p-values of each kind of data
zexp<-coverp2zscore(exp.p)
zmeth<-coverp2zscore(meth.p)
zcnv<-coverp2zscore(cnv.p)
#combine two kinds of p-values,then,calculate z-score for them
zz<-combinep_two(exp.p,meth.p)
#combine three kinds of p-values,then,calculate z-score for them
zzz<-combinep_three(exp.p,meth.p,cnv.p)
zzz[1:6,]
```

## 4 Example: Obtain subpathways
For a priori KEGG path, we perform a greedy algorithm to search for interested subpathways.We provide two statistical test methods of the subpathway, which one is whole gene-based perturbation, and the other is the local gene perturbation in a particular pathway.
```{r}
#obtain z-score of each gene
zzz<-GetExampleData("zzz")
zzz[1:10,]

```

```{r eval= FALSE}
require(graphite)
zz<-GetExampleData("zzz")
#subpathdata<-FindSubPath(zz) #only show

```
Optimize interested subpathways.If the number of genes shared by the two pathways accounted for more than the Overlap ratio of each pathway genes,than combine two pathways.
```{r}
subpathdata<-GetExampleData("subpathdata")
keysubpathways<-opt_subpath(subpathdata,zz,overlap=0.6)
head(keysubpathways)
```
the perturbation test was used to calculate statistical significance for these interested subpathways.
```{r}
keysubpathways<-Permutation(keysubpathways,zz,nperm1=100,method1=TRUE,nperm2=100,method2=FALSE)
head(keysubpathways)
```

## 4 Example: plot a network graph when user input a list of gene

```{r}
require(graphite)
require(org.Hs.eg.db)
subpID<-unlist(strsplit("ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2","/"))
pathway.name="Glycolysis / Gluconeogenesis"
zzz<- GetExampleData("zzz")
PlotSubpathway(subpID=subpID,pathway.name=pathway.name,zz=zzz)
```