---
title: "Quick Example of Simulated Kinships with Partial Parentage"
subtitle: "nprcgenekeepr: an R Package for the Genetic Management of Colonies"
author: "R. Mark Sharp, Ph.D."
date: "8/29/2021"
output:
  html_document:
    df_print: paged
vignette: >
  %\VignetteEngine{knitr::rmarkdown_notangle} 
  %\VignetteIndexEntry{Simulated Kinships with Partial Parentage} 
  %\usepackage[UTF-8]{inputenc}
---

## Introduction

This vignette provides a quick example of how to estimate kinship coefficients
using simulation. One simulation uses 100 iterations and the other uses the 
same setup with 1000 simulations to illustrate the type of kinship coefficient
estimate variance you can expect with a simple pedigree exhibiting partial 
parentage for some of the members.

### Anticipation of further uses of this kinship estimation method

Retention of founder alleles is a primary driving force behind this package of
utilities. This example is intentionally overly simplistic to clearly illustrate
this one aspect of the methodology used with realistic pedigrees. 

In this example, some of the potential parents are not members of the example
pedigree and thus, by definition, have a kinship coeficient of 0.0 with all 
other pedigree members.

In practice most or all of the potential parents will be members of the same
pedigree and will potentially have nonzero kinship coeficients with other 
pedigree members.

#### Identification of potential parents

This tutorial assumes knowledge of potential parents and does not present 
methods for identifying potential parents.

<!-- RMS TODO: Add a link to a tutorial on this topic. -->

Capabilities to be address with resolution of Issue #28[^issue28] will provide
users the ability to fully automate the identification of potential parents. 
However, prior to that capability being available, users can use other methods
to identify potential parents.

[^issue28]: Add ability to use timestamped transactional location data to identify potential parents for animals with a missing parent.

### Creation of example pedigree[^not-realistic]

[^not-realistic]: This example is not realistic and particularly unrealistic
with regard to potential dams. While it is possible for two animals to be 
switched near the time of birth so that dam and sire are not know with certainty
and still have a limited number of potential parents. This is not a common
occurrence in colonies of which we are aware.

The example is based on the following simple pedigree setup. In the pedigree 
given below, all of the original animals have identifiers that are single
letters of the alphabet.:

Only those IDs that have unknown parents are included in the \code{simParent_n}
lists. For those IDs with one known parent, the known parent is included in the 
\code{simParent_n}. See \code{simParent_1} for ID \code{A}. 
Note also that potential parents
can either come from the pedigree being augmented by simulation or from outside
the pedigree. See \code{simParent_3}, \code{simParent_4}, \code{simParent_5},
and \code{simParent_6} for examples of this.

Currently, if you want to provide differential weights for the different 
potential parents, you will need to do this directly by modifying the
number of times each parent is included in the \code{simParent_n} list.

Animals \code{E}, \code{J}, and \code{N}
<!-- TODO explain the use of the simParent_n lists -->

```{r setup}
knitr::opts_chunk$set(echo = TRUE)
library(kableExtra)
library(magrittr)
library(nprcgenekeepr)
library(stringi)

ped <- nprcgenekeepr::smallPed
simParent_1 <- list( # nolint: object_name_linter
  id = "A", # nolint: object_name_linter
  sires = "Q",
  dams = c("d1_1", "d1_2", "d1_3", "d1_4")
)
simParent_2 <- list( # nolint: object_name_linter
  id = "B", # nolint: object_name_linter
  sires = c("s1_1", "s1_2", "s1_3"),
  dams = c("d1_1", "d1_2", "d1_3", "d1_4")
)
simParent_3 <- list( # nolint: object_name_linter
  id = "E", # nolint: object_name_linter
  sires = c("A", "C", "s1_1"),
  dams = c("d3_1", "B")
)
simParent_4 <- list( # nolint: object_name_linter
  id = "J", # nolint: object_name_linter
  sires = c("A", "C", "s1_1"),
  dams = c("d3_1", "B")
)
simParent_5 <- list( # nolint: object_name_linter
  id = "K", # nolint: object_name_linter
  sires = c("A", "C", "s1_1"),
  dams = c("d3_1", "d1_2")
)
simParent_6 <- list( # nolint: object_name_linter
  id = "N", # nolint: object_name_linter
  sires = c("A", "C", "s1_2"),
  dams = c("d3_1", "B")
)
allSimParents <- list(
  simParent_1, simParent_2, simParent_3,
  simParent_4, simParent_5, simParent_6
)

extractKinship <- function(simKinships, id1, id2, simulation) {
  ids <- dimnames(simKinships[[simulation]])[[1L]]
  simKinships[[simulation]][
    seq_along(ids)[ids == id1],
    seq_along(ids)[ids == id2]
  ]
}

extractKValue <- function(kValue, id1, id2, simulation) {
  kValue[kValue$id_1 == id1 & kValue$id_2 == id2, paste0("sim_", simulation)]
}
```


### Small Example

This is the simulation. I am only printing out rows with kinship values that
vary.

Before running these simulations, take time to look at the included function
descriptions to see what they are expecting as arguments and what they return.

```{r show-function-help}
#| eval: FALSE

?createSimKinships
?kinshipMatricesToKValues
?extractKValue
```
```{r small-simulation}
# Only set this seed if you want to get the same simulation results each time.
set.seed(1L)
n <- 10L
simKinships <- createSimKinships(ped, allSimParents, pop = ped$id, n = n)
kValues <- kinshipMatricesToKValues(simKinships)
extractKValue(kValues, id1 = "A", id2 = "F", simulation = 1L:n)
counts <- countKinshipValues(kValues)
counts$kinshipIds[1L:3L]
counts$kinshipValues[1L:3L]
counts$kinshipCounts[1L:3L]

stats_10 <- summarizeKinshipValues(counts)
nrow(stats_10[stats_10$sd > 0.0, ])

kable(stats_10[stats_10$sd > 0.0, ], longtable = TRUE) %>%
  kable_styling(
    latex_options = c("striped", "repeat_header"),
    repeat_header_method = "replace",
    repeat_header_text = "\\textit{(continued)}"
  )
```
A larger simulation

```{r larger-simulation}
set.seed(1L)
n <- 100L
simKinships <- createSimKinships(ped, allSimParents, pop = ped$id, n = n)
kValues <- kinshipMatricesToKValues(simKinships)
extractKValue(kValues, id1 = "A", id2 = "F", simulation = 1L:10L)
counts <- countKinshipValues(kValues)
counts$kinshipIds[1L:3L]
counts$kinshipValues[1L:3L]
counts$kinshipCounts[1L:3L]

stats_100 <- summarizeKinshipValues(counts)
nrow(stats_100[stats_100$sd > 0.0, ])

kable(stats_100[stats_100$sd > 0.0, ], longtable = TRUE) %>%
  kable_styling(
    latex_options = c("striped", "repeat_header"),
    repeat_header_method = "replace",
    repeat_header_text = "\\textit{(continued)}"
  )
```


A much larger simulation

```{r much-larger-simulation}
set.seed(1L)
n <- 1000L
simKinships <- createSimKinships(ped, allSimParents, pop = ped$id, n = n)
kValues <- kinshipMatricesToKValues(simKinships)
extractKValue(kValues, id1 = "A", id2 = "F", simulation = 1L:10L)
counts <- countKinshipValues(kValues)
counts$kinshipIds[1L:3L]
counts$kinshipValues[1L:3L]
counts$kinshipCounts[1L:3L]

stats_1000 <- summarizeKinshipValues(counts)
nrow(stats_1000[stats_1000$sd > 0.0, ])

kable(stats_1000[stats_1000$sd > 0.0, ], longtable = TRUE) %>%
  kable_styling(
    latex_options = c("striped", "repeat_header"),
    repeat_header_method = "replace",
    repeat_header_text = "\\textit{(continued)}"
  )
```


Comparing the values and variation found for the various kinship values:

```{r comparison-10-1000}
stats_short <- stats_10[stats_10$sd > 0.0, ]
stats_long <- stats_1000[stats_1000$sd > 0.0, ]

if (any(stats_short$id_1 != stats_long$id_1) ||
  any(stats_short$id_2 != stats_long$id_2)) {
  cat("At least one row represents a different animal pair")
}

comprison <- data.frame(
  id_1 = stats_short$id_1,
  id_2 = stats_short$id_2,
  meanKin_short = stats_short$mean,
  meanKin_long = stats_long$mean,
  meanKinDelta = abs(stats_short$mean - stats_long$mean),
  sdKin_short = stats_short$sd,
  sdKin_long = stats_long$sd,
  sdKinDelta = abs(stats_short$sd - stats_long$sd)
)

kable(comprison,
  longtable = TRUE,
  digits = c(0L, 0L, 4L, 4L, 4L, 4L, 4L, 4L),
  caption = stri_c(
    "Comparision of estimated kinships between simulations ",
    "of 10 (short) and 1000 (long)"
  )
) %>%
  kable_styling(
    latex_options = c("striped", "repeat_header"),
    repeat_header_method = "replace",
    repeat_header_text = "\\textit{(continued)}",
    font_size = 10L
  )
```

```{r comparison-100-1000}
stats_short <- stats_100[stats_100$sd > 0.0, ]
stats_long <- stats_1000[stats_1000$sd > 0.0, ]

if (any(stats_short$id_1 != stats_long$id_1) ||
  any(stats_short$id_2 != stats_long$id_2)) {
  cat("At least one row represents a different animal pair")
}

comprison <- data.frame(
  id_1 = stats_short$id_1,
  id_2 = stats_short$id_2,
  meanKin_short = stats_short$mean,
  meanKin_long = stats_long$mean,
  meanKinDelta = abs(stats_short$mean - stats_long$mean),
  sdKin_short = stats_short$sd,
  sdKin_long = stats_long$sd,
  sdKinDelta = abs(stats_short$sd - stats_long$sd)
)

kable(comprison,
  longtable = TRUE,
  digits = c(0L, 0L, 4L, 4L, 4L, 4L, 4L, 4L),
  caption = stri_c(
    "Comparision of estimated kinships between simulations ",
    "of 100 (short) and 1000 (long)"
  )
) %>%
  kable_styling(
    latex_options = c("striped", "repeat_header"),
    repeat_header_method = "replace",
    repeat_header_text = "\\textit{(continued)}",
    font_size = 10L
  )
```