--- title: "Quick Example of Simulated Kinships with Partial Parentage" subtitle: "nprcgenekeepr: an R Package for the Genetic Management of Colonies" author: "R. Mark Sharp, Ph.D." date: "8/29/2021" output: html_document: df_print: paged vignette: > %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteIndexEntry{Simulated Kinships with Partial Parentage} %\usepackage[UTF-8]{inputenc} --- ## Introduction This vignette provides a quick example of how to estimate kinship coefficients using simulation. One simulation uses 100 iterations and the other uses the same setup with 1000 simulations to illustrate the type of kinship coefficient estimate variance you can expect with a simple pedigree exhibiting partial parentage for some of the members. ### Anticipation of further uses of this kinship estimation method Retention of founder alleles is a primary driving force behind this package of utilities. This example is intentionally overly simplistic to clearly illustrate this one aspect of the methodology used with realistic pedigrees. In this example, some of the potential parents are not members of the example pedigree and thus, by definition, have a kinship coeficient of 0.0 with all other pedigree members. In practice most or all of the potential parents will be members of the same pedigree and will potentially have nonzero kinship coeficients with other pedigree members. #### Identification of potential parents This tutorial assumes knowledge of potential parents and does not present methods for identifying potential parents. Capabilities to be address with resolution of Issue #28[^issue28] will provide users the ability to fully automate the identification of potential parents. However, prior to that capability being available, users can use other methods to identify potential parents. [^issue28]: Add ability to use timestamped transactional location data to identify potential parents for animals with a missing parent. ### Creation of example pedigree[^not-realistic] [^not-realistic]: This example is not realistic and particularly unrealistic with regard to potential dams. While it is possible for two animals to be switched near the time of birth so that dam and sire are not know with certainty and still have a limited number of potential parents. This is not a common occurrence in colonies of which we are aware. The example is based on the following simple pedigree setup. In the pedigree given below, all of the original animals have identifiers that are single letters of the alphabet.: Only those IDs that have unknown parents are included in the \code{simParent_n} lists. For those IDs with one known parent, the known parent is included in the \code{simParent_n}. See \code{simParent_1} for ID \code{A}. Note also that potential parents can either come from the pedigree being augmented by simulation or from outside the pedigree. See \code{simParent_3}, \code{simParent_4}, \code{simParent_5}, and \code{simParent_6} for examples of this. Currently, if you want to provide differential weights for the different potential parents, you will need to do this directly by modifying the number of times each parent is included in the \code{simParent_n} list. Animals \code{E}, \code{J}, and \code{N} ```{r setup} knitr::opts_chunk$set(echo = TRUE) library(kableExtra) library(magrittr) library(nprcgenekeepr) library(stringi) ped <- nprcgenekeepr::smallPed simParent_1 <- list( # nolint: object_name_linter id = "A", # nolint: object_name_linter sires = "Q", dams = c("d1_1", "d1_2", "d1_3", "d1_4") ) simParent_2 <- list( # nolint: object_name_linter id = "B", # nolint: object_name_linter sires = c("s1_1", "s1_2", "s1_3"), dams = c("d1_1", "d1_2", "d1_3", "d1_4") ) simParent_3 <- list( # nolint: object_name_linter id = "E", # nolint: object_name_linter sires = c("A", "C", "s1_1"), dams = c("d3_1", "B") ) simParent_4 <- list( # nolint: object_name_linter id = "J", # nolint: object_name_linter sires = c("A", "C", "s1_1"), dams = c("d3_1", "B") ) simParent_5 <- list( # nolint: object_name_linter id = "K", # nolint: object_name_linter sires = c("A", "C", "s1_1"), dams = c("d3_1", "d1_2") ) simParent_6 <- list( # nolint: object_name_linter id = "N", # nolint: object_name_linter sires = c("A", "C", "s1_2"), dams = c("d3_1", "B") ) allSimParents <- list( simParent_1, simParent_2, simParent_3, simParent_4, simParent_5, simParent_6 ) extractKinship <- function(simKinships, id1, id2, simulation) { ids <- dimnames(simKinships[[simulation]])[[1L]] simKinships[[simulation]][ seq_along(ids)[ids == id1], seq_along(ids)[ids == id2] ] } extractKValue <- function(kValue, id1, id2, simulation) { kValue[kValue$id_1 == id1 & kValue$id_2 == id2, paste0("sim_", simulation)] } ``` ### Small Example This is the simulation. I am only printing out rows with kinship values that vary. Before running these simulations, take time to look at the included function descriptions to see what they are expecting as arguments and what they return. ```{r show-function-help} #| eval: FALSE ?createSimKinships ?kinshipMatricesToKValues ?extractKValue ``` ```{r small-simulation} # Only set this seed if you want to get the same simulation results each time. set.seed(1L) n <- 10L simKinships <- createSimKinships(ped, allSimParents, pop = ped$id, n = n) kValues <- kinshipMatricesToKValues(simKinships) extractKValue(kValues, id1 = "A", id2 = "F", simulation = 1L:n) counts <- countKinshipValues(kValues) counts$kinshipIds[1L:3L] counts$kinshipValues[1L:3L] counts$kinshipCounts[1L:3L] stats_10 <- summarizeKinshipValues(counts) nrow(stats_10[stats_10$sd > 0.0, ]) kable(stats_10[stats_10$sd > 0.0, ], longtable = TRUE) %>% kable_styling( latex_options = c("striped", "repeat_header"), repeat_header_method = "replace", repeat_header_text = "\\textit{(continued)}" ) ``` A larger simulation ```{r larger-simulation} set.seed(1L) n <- 100L simKinships <- createSimKinships(ped, allSimParents, pop = ped$id, n = n) kValues <- kinshipMatricesToKValues(simKinships) extractKValue(kValues, id1 = "A", id2 = "F", simulation = 1L:10L) counts <- countKinshipValues(kValues) counts$kinshipIds[1L:3L] counts$kinshipValues[1L:3L] counts$kinshipCounts[1L:3L] stats_100 <- summarizeKinshipValues(counts) nrow(stats_100[stats_100$sd > 0.0, ]) kable(stats_100[stats_100$sd > 0.0, ], longtable = TRUE) %>% kable_styling( latex_options = c("striped", "repeat_header"), repeat_header_method = "replace", repeat_header_text = "\\textit{(continued)}" ) ``` A much larger simulation ```{r much-larger-simulation} set.seed(1L) n <- 1000L simKinships <- createSimKinships(ped, allSimParents, pop = ped$id, n = n) kValues <- kinshipMatricesToKValues(simKinships) extractKValue(kValues, id1 = "A", id2 = "F", simulation = 1L:10L) counts <- countKinshipValues(kValues) counts$kinshipIds[1L:3L] counts$kinshipValues[1L:3L] counts$kinshipCounts[1L:3L] stats_1000 <- summarizeKinshipValues(counts) nrow(stats_1000[stats_1000$sd > 0.0, ]) kable(stats_1000[stats_1000$sd > 0.0, ], longtable = TRUE) %>% kable_styling( latex_options = c("striped", "repeat_header"), repeat_header_method = "replace", repeat_header_text = "\\textit{(continued)}" ) ``` Comparing the values and variation found for the various kinship values: ```{r comparison-10-1000} stats_short <- stats_10[stats_10$sd > 0.0, ] stats_long <- stats_1000[stats_1000$sd > 0.0, ] if (any(stats_short$id_1 != stats_long$id_1) || any(stats_short$id_2 != stats_long$id_2)) { cat("At least one row represents a different animal pair") } comprison <- data.frame( id_1 = stats_short$id_1, id_2 = stats_short$id_2, meanKin_short = stats_short$mean, meanKin_long = stats_long$mean, meanKinDelta = abs(stats_short$mean - stats_long$mean), sdKin_short = stats_short$sd, sdKin_long = stats_long$sd, sdKinDelta = abs(stats_short$sd - stats_long$sd) ) kable(comprison, longtable = TRUE, digits = c(0L, 0L, 4L, 4L, 4L, 4L, 4L, 4L), caption = stri_c( "Comparision of estimated kinships between simulations ", "of 10 (short) and 1000 (long)" ) ) %>% kable_styling( latex_options = c("striped", "repeat_header"), repeat_header_method = "replace", repeat_header_text = "\\textit{(continued)}", font_size = 10L ) ``` ```{r comparison-100-1000} stats_short <- stats_100[stats_100$sd > 0.0, ] stats_long <- stats_1000[stats_1000$sd > 0.0, ] if (any(stats_short$id_1 != stats_long$id_1) || any(stats_short$id_2 != stats_long$id_2)) { cat("At least one row represents a different animal pair") } comprison <- data.frame( id_1 = stats_short$id_1, id_2 = stats_short$id_2, meanKin_short = stats_short$mean, meanKin_long = stats_long$mean, meanKinDelta = abs(stats_short$mean - stats_long$mean), sdKin_short = stats_short$sd, sdKin_long = stats_long$sd, sdKinDelta = abs(stats_short$sd - stats_long$sd) ) kable(comprison, longtable = TRUE, digits = c(0L, 0L, 4L, 4L, 4L, 4L, 4L, 4L), caption = stri_c( "Comparision of estimated kinships between simulations ", "of 100 (short) and 1000 (long)" ) ) %>% kable_styling( latex_options = c("striped", "repeat_header"), repeat_header_method = "replace", repeat_header_text = "\\textit{(continued)}", font_size = 10L ) ```