| Title: | Constitution of Core Collections using Length of Encoded Attribute Values |
| Version: | 0.1.0 |
| Description: | Construct core collections using the information measure 'Length of Encoded Attribute Values' (LEAV) using qualitative and/or quantitative trait data as described by Balakrishnan and Suresh (2001a) https://indianjournals.com/article/ijpgr-14-1-006 and (2001b) https://indianjournals.com/article/ijpgr-14-3-005. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Encoding: | UTF-8 |
| BuildManual: | TRUE |
| Imports: | mathjaxr, Rdpack, dplyr, stats, stratification |
| Suggests: | EvaluateCore, knitr, rmarkdown, SampleCore, pander |
| RdMacros: | mathjaxr, Rdpack |
| Copyright: | 2024-2026, ICAR-NBPGR |
| URL: | https://github.com/aravind-j/LEAVcore https://aravind-j.github.io/LEAVcore/ |
| BugReports: | https://github.com/aravind-j/LEAVcore/issues |
| Config/roxygen2/version: | 8.0.0 |
| RoxygenNote: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-05-31 12:37:36 UTC; J. Aravind |
| Author: | J. Aravind |
| Maintainer: | J. Aravind <j.aravind@icar.org.in> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-04 12:00:13 UTC |
LEAVcore: Constitution of Core Collections using Length of Encoded Attribute Values
Description
Construct core collections using the information measure 'Length of Encoded Attribute Values' (LEAV) using qualitative and/or quantitative trait data as described by Balakrishnan and Suresh (2001a) https://indianjournals.com/article/ijpgr-14-1-006 and (2001b) https://indianjournals.com/article/ijpgr-14-3-005.
Author(s)
Maintainer: J. Aravind j.aravind@icar.org.in (ORCID)
Authors:
J. Aravind j.aravind@icar.org.in (ORCID)
Suman Roy suman.roy@icar.org.in (ORCID)
Anju Mahendru Singh anju.singh@icar.org.in (ORCID)
Other contributors:
ICAR-NBGPR (ROR) (url: https://nbpgr.org.in/) [copyright holder]
See Also
Useful links:
-
https://github.com/aravind-j/LEAVcore https://aravind-j.github.io/LEAVcore/
Report bugs at https://github.com/aravind-j/LEAVcore/issues
Compute the Length of Encoded Attribute Values
Description
For accessions in a collection compute the Length of Encoded Attribute Values (LEAV) information measure from qualitative and quantitative trait data (Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003).
Usage
LEAV(
data,
names,
quantitative = NULL,
qualitative = NULL,
freq,
adj = TRUE,
mean,
sd,
e
)
Arguments
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
freq |
A named list with the target absolute frequencies of the
descriptor states for each qualitative trait specified in
|
adj |
logical. If |
mean |
A named numeric vector of target means for each quantitative
trait specified in |
sd |
A named numeric vector of target standard deviation for each
quantitative trait specified in |
e |
A named numeric vector of least count of measurement for each
quantitative trait specified in |
Details
For each accession \(s\) in the collection, the message length \(F_{s}\) to optimally encode all the \(d\) traits/descriptors is computed as follows using the joint density distribution of the whole collection.
\[F_{s} = l_{t} + \sum_{i=1}^{p} c_{m_{s},d_{i},t} + \sum_{j=1}^{q} c_{x_{s},d_{j},t}\]Here, the first expression \(l_{t}\) is the message length for the subset \(t\) to which an accession belongs when there are \(N\) accessions in the whole collection and \(n_{t}\) accessions in the subset \(t\).
\[l_{t} = l_{t} = - \ln{\left ( \frac{N}{n_{t}} \right )}\]Similarly \(\sum_{i=1}^{p} c_{m_{s},d_{i},t}\) is sum of the optimum
message length for \(p\) qualitative traits, \(\sum_{j=1}^{q}
c_{x_{s},d_{j},t}\) is sum of the optimum optimum message length for
\(q\) quantitative traits. See inflen.qual and
inflen.quant for more details.
Value
A data frame with one row per accession in data and the
following columns:
namesAccession identifiers, as specified by the
namesargument.ltThe log-ratio message length term, \(\log(N / n)\), where \(N\) is the total number of accessions in
dataand \(n\) is the sum of frequencies infreq.<qualitative traits>One column per trait specified in
qualitative, giving the information length \(-\log(p_k)\) for the level \(k\) of that trait observed for each accession.<quantitative traits>One column per trait specified in
quantitative, giving the Gaussian information length \(\log(\sigma / c \varepsilon) + (x - \mu)^2 / 2\sigma^2\) for each accession, where \(c = 1/\sqrt{2\pi}\).LEAVThe total information length for each accession, equal to the row sum of
ltand all trait information length columns.
References
Balakrishnan R, Nair NV (2003).
“Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.”
Plant Genetic Resources Newsletter, 134, 33–41.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.”
Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.”
Indian Journal of Plant Genetic Resources, 14(3), 342–349.
Wallace CS, Boulton DM (1968).
“An information measure for classification.”
The Computer Journal, 11(2), 185–194.
See Also
Examples
suppressPackageStartupMessages(library(EvaluateCore))
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
cassava_EC <- cassava_EC[sample(1:nrow(cassava_EC), 500), ]
cassava_EC <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
"ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
"ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
"PSTR")
cassava_EC[, qual] <- lapply(cassava_EC[, qual], as.factor)
size <- 0.2
freq_list <- lapply(qual, function(x) {
prop <- prop.adj(cassava_EC[, x], method = "sqrt")
size.count <- ceiling(size * length(x))
round_preserve_sum(prop * size.count)
})
names(freq_list) <- qual
mean_vec <- sapply(cassava_EC[, quant],
function(x) {
floor(mean(x))
})
names(mean_vec) <- quant
sd_vec <- sapply(cassava_EC[quant],
function(x) {
round(sd(x), 1)
})
names(sd_vec) <- quant
e_vec <- rep(1, length(quant))
names(e_vec) <- quant
# Compute LEAV
LEAV_cassava <- LEAV(data = cassava_EC, names = "genotypes",
quantitative = quant, qualitative = qual,
freq = freq_list, adj = TRUE,
mean = mean_vec, sd = sd_vec, e = e_vec)
head(LEAV_cassava)
Generate Core collections with Length of Encoded Attribute Values
Description
Based on Length of Encoded Attribute Values (LEAV) (Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003) estimated from qualitative and/or quantitative trait data, core collections can be generated by the three following methods.
- Method I
Classification based on pre-determined diversity represented by LEAV estimates implemented in
LEAVcore1.- Method II
Purposive selection of accessions with highest rank of LEAV estimates implemented in
LEAVcore2.- Method III
Stratified sampling of accessions from diversity groups/strata computed from LEAV estimates partially implemented in
LEAVcore3.
Usage
LEAVcore1(
data,
names,
quantitative = NULL,
qualitative = NULL,
size,
prop.adj = c("none", "log", "sqrt"),
e,
always.selected = NULL
)
LEAVcore2(
data,
names,
quantitative = NULL,
qualitative = NULL,
size,
prop.adj = c("none", "log", "sqrt"),
e,
always.selected = NULL
)
LEAVcore3(
data,
names,
quantitative = NULL,
qualitative = NULL,
size,
prop.adj = c("none", "log", "sqrt"),
e,
always.selected = NULL
)
Arguments
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
size |
The desired core set size proportion. |
prop.adj |
The method for relative frequency transformation for
qualitative traits. Either |
e |
A named numeric vector of least count of measurement for each
quantitative trait specified in |
always.selected |
Names of accessions to be always included in the core set as a character vector. |
Details
Balakrishnan and Suresh (2001); Balakrishnan and Suresh (2001) describe three different methods of constructing core collections from estimates of Length of Encoded Attribute Values.
Method I: Classification based on pre-determined diversity represented by LEAV estimates
This is an objective classification scheme that assigns accessions to either a "core" or "non-core" group based on which group model they best fit.
The target frequency patterns for qualitative traits and distribution parameters for quantitative traits are determined first for the two groups: the Core and the Non-Core.
Target proportions for the core group are estimated from the base
proportions of the qualitative trait levels. These may be subjected to
transformations if required according to "prop.adj" argument to
increase rare trait representation. Target counts are set by scaling these
to the total count and capping them at the actual frequency available in
the collection. Similarly for the non-core group, the target proportions
are determined by subtracting the core model's frequencies from the total
counts of each trait level in the entire collection.
The target distribution for the core group is modeled by applying a Gaussian kernel density function to the quantitative trait data, scaled to the core size. The non-core parameters are set to the actual mean and standard deviation of the entire collection.
Based on these target values, the message length (\(F\)) is estimated
for each accession against both the models using
LEAV. An accession is assigned to the core if
\(F_{core} \leq F_{non-core}\). If more accessions are selected than
the target core size, the core is refined by ranking individuals by
\(F_{core}\) values in ascending order and retaining only the top
matches.
Method II: Purposive selection of accessions with highest rank of LEAV estimates
This is a directed selection method that captures the most unique and dispersed accessions to maximize diversity and reduce redundancy.
Here the LEAV index for every accession relative to the entire base
collection is first estimated. Then the accessions are ranked in descending
order of their LEAV estimates.Finally the core collection is constituted by
selecting a pre-determined number of top-ranked accessions according to
"size" argument.
Method III: Stratified sampling of accessions from diversity groups/strata computed from LEAV estimates
This is a two-step approach that first organizes the collection into optimized diversity groups based on LEAV estimates followed by a group-wise representative sampling.
Here also the LEAV index for every accession relative to the entire base collection is first estimated. These estimates are then divided into \(L\) strata using the Dalenius formula to minimize pooled variance (Dalenius and Hodges 1959).
The number of entries to be sampled from each stratum is then determined followed by stratified selection from each group to reach the final core size.
In LEAVcore3, only the stratification based on LEAV estimates is
implemented. The downstream steps for allocation
(allocate.basic, allocate.diversity,
allocate.distance) and stratified selection
(select.random, select.diversity,
select.distance) are available from the sister package
SampleCore.
Value
LEAVcore1 returns a data frame with one row per accession in
data and the following columns:
namesAccession identifiers, as specified by the
namesargument.LEAV_coreThe total LEAV score for each accession computed under the core group parameterisation (frequencies and moments estimated from the target core subset).
LEAV_noncoreThe total LEAV score for each accession computed under the non-core group parameterisation (frequencies and moments estimated from the remainder of the collection).
always.selectedA logical vector indicating whether the accession was pre-specified in
always.selected.coreA logical vector indicating whether the accession is selected into the core collection, either because
LEAV_core≤qLEAV_noncore(selected by the method) or because it appears inalways.selected.
LEAVcore2 returns a data frame with one row per accession in
data, sorted in decreasing order of LEAV score, with the following
columns:
namesAccession identifiers, as specified by the
namesargument.ltThe log-ratio message length term \(\log(N / n)\), where \(N\) is the total number of accessions in
dataand \(n\) issize.count.<trait columns>One column per trait specified in
qualitativeandquantitative, giving the per-accession information length for that trait.LEAVThe total LEAV score for each accession, equal to the row sum of
ltand all trait information length columns.always.selectedA logical vector indicating whether the accession was pre-specified in
always.selected.coreA logical vector indicating whether the accession is selected into the core collection, either as one of the top
size.countranked accessions among non-always.selectedaccessions or because it appears inalways.selected.
LEAVcore3 returns a data frame with one row per accession in data,
sorted in decreasing order of LEAV score, with the following columns:
namesAccession identifiers, as specified by the
namesargument.ltThe log-ratio message length term \(\log(N / n)\), where \(N\) is the total number of accessions in
dataand \(n\) issize.count.<trait columns>One column per trait specified in
qualitativeandquantitative, giving the per-accession information length for that trait.LEAVThe total LEAV score for each accession, equal to the row sum of
ltand all trait information length columns.LEAVStrataAn integer stratum identifier assigned by the Dalenius-Hodges cumulative root frequency method (Dalenius and Hodges 1959), indicating the stratum to which each accession belongs for proportional sampling.
NAfor accessions inalways.selected, which are excluded from stratification.always.selectedA logical vector indicating whether the accession was pre-specified in
always.selectedand is therefore excluded from stratification.
References
Balakrishnan R, Nair NV (2003).
“Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.”
Plant Genetic Resources Newsletter, 134, 33–41.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.”
Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.”
Indian Journal of Plant Genetic Resources, 14(3), 342–349.
Dalenius T, Hodges JL (1959).
“Minimum variance stratification.”
Journal of the American Statistical Association, 54(285), 88–101.
Wallace CS, Boulton DM (1968).
“An information measure for classification.”
The Computer Journal, 11(2), 185–194.
See Also
inflen.qual,
inflen.quant, LEAV,
allocate.basic,
allocate.distance,
allocate.diversity,
select.random,
select.distance,
select.diversity
Examples
suppressPackageStartupMessages(library(EvaluateCore))
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
cassava_EC <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
"ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
"ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
"PSTR")
cassava_EC[, qual] <- lapply(cassava_EC[, qual], as.factor)
e_vec <- rep(1, length(quant))
names(e_vec) <- quant
mand_accns <-
c("TMe-2018", "TMe-801", "TMe-3191", "TMe-1830", "TMe-1790")
table(cassava_EC$genotypes %in% mand_accns)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Method I
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LEAVcore1_out <-
LEAVcore1(data = cassava_EC, names = "genotypes",
quantitative = quant, qualitative = qual,
size = 0.2, prop.adj = "log", e = e_vec,
always.selected = mand_accns)
head(LEAVcore1_out)
# Selected accessions for core
core1 <- LEAVcore1_out[LEAVcore1_out$core == TRUE, "genotypes"]
core1
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Method II
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LEAVcore2_out <-
LEAVcore2(data = cassava_EC, names = "genotypes",
quantitative = quant, qualitative = qual,
size = 0.2, prop.adj = "log", e = e_vec,
always.selected = mand_accns)
head(LEAVcore2_out)
# Selected accessions for core
core2 <- LEAVcore2_out[LEAVcore2_out$core == TRUE, "genotypes"]
core2
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Method III
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LEAVcore3_out <-
LEAVcore3(data = cassava_EC, names = "genotypes",
quantitative = quant, qualitative = qual,
size = 0.2, prop.adj = "log", e = e_vec,
always.selected = mand_accns)
head(LEAVcore3_out)
# Strata/Group-wise counts
table(LEAVcore3_out$LEAVStrata)
# Sample accessions from strata to form core set using SampleCOre
suppressPackageStartupMessages(library(SampleCore))
# Append LEAV strata to original data
data <- merge.data.frame(cassava_EC,
LEAVcore3_out[, c("genotypes", "LEAVStrata",
"always.selected")],
by = "genotypes")
data$LEAVStrata <- as.factor(data$LEAVStrata)
# Use log allocation
log_alloc <-
allocate.basic(data = data[data$always.selected != TRUE, ],
names = "genotypes",
group = "LEAVStrata", method = "log",
size = 0.2)
# Use random selection
set.seed(123)
sel_random_out <-
select.random(data = data[data$always.selected != TRUE, ],
names = "genotypes",
group = "LEAVStrata", alloc = log_alloc,
# Already included in LEAVcore3_out
always.selected = NULL)
# Append always selected accessions
core3 <-
c(sel_random_out,
list(always.selected =
LEAVcore3_out[LEAVcore3_out$always.selected == TRUE,
"genotypes"]))
# Final core
core3
Compute Information Length for Qualitative Traits
Description
The function inflen.qual computes the length of information code that
can indicate the possession of a descriptor state of a qualitative trait
(Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003).
Usage
inflen.qual(x, freq, adj = TRUE)
Arguments
x |
Data of a qualitative trait for accessions in a collection as a vector of type factor. |
freq |
The target absolute frequencies of the descriptor states of the
qualitative trait |
adj |
logical. If |
Details
For each qualitative trait/descriptor \(d\) the probability of occurrence of a descriptor state \(m\) in the in a subset \(t\) is estimated as
\[p_{m,d,t} = \frac{n_{m,d,t} + 1}{n_{d,t} + M_{d}}\]Where, \(n_{m,d,t}\) is the number of accessions with \(m\) state of trait \(d\) in subset \(t\), \(n_{d,t}\) is the number of accessions with any known state of trait \(d\) in subset \(t\),i.e. the number of accessions in subset \(t\) and \(M_{d}\) is the number of descriptor states of trait \(d\).
This is a slightly biased estimate to include zero frequency descriptor states in the computation. The actual estimate is
\[p_{m,d,t} = \frac{n_{m,d,t}}{n_{d,t}}\]Now the length of the information code that can optimally indicate the possession of descriptor state \(m\) of trait \(d\) in the subset \(t\) is computed as
\[c_{m,d,t} = -\ln p_{m,d,t}\]Value
A data frame with 2 columns:
- x
The qualitative trait data
- inflen
Information length computed
References
Balakrishnan R, Nair NV (2003).
“Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.”
Plant Genetic Resources Newsletter, 134, 33–41.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.”
Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.”
Indian Journal of Plant Genetic Resources, 14(3), 342–349.
Wallace CS, Boulton DM (1968).
“An information measure for classification.”
The Computer Journal, 11(2), 185–194.
See Also
Examples
suppressPackageStartupMessages(library(EvaluateCore))
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
# Data of 'Colour of unexpanded apical leaves' qualitative trait
CUAL <- as.factor(cassava_EC$CUAL)
# Get frequencies based on sample size
prop <- prop.adj(CUAL, method = "sqrt")
size.prop <- 0.2
size.count <- ceiling(size.prop * length(CUAL))
CUALfreq <- round(prop * size.count)
# Compute information length
CUALinflen <- inflen.qual(x = CUAL, freq = CUALfreq, adj = TRUE)
head(CUALinflen)
Compute Information Length for Quantitative Traits
Description
The function inflen.quant computes the length of information code that
can indicate the possession of a specific value by a quantitative trait
(Wallace and Boulton 1968; Balakrishnan and Suresh 2001; Balakrishnan and Suresh 2001; Balakrishnan and Nair 2003).
Usage
inflen.quant(x, mean, sd, e = 1)
Arguments
x |
Data of a quantitative trait for accessions in a collection as a numeric vector. |
mean |
The target mean. |
sd |
The target standard deviation |
e |
The least count of measurement for the quantitative trait (i.e. the accuracy of measurement). |
Details
For each quantitative trait \(d\), it is assumed that it is normally distributed within subset \(t\) with mean \(\mu_{d,t}\) and the standard deviation \(\sigma_{d,t}\) estimated as below.
\[\mu_{d,t} = \frac{\sum x_{d,s}}{n_{d,t}}\] \[\sigma_{d,t} = \sqrt{\frac{\sum(x_{d,s} - \mu_{d,t})^2}{n_{d,t} - 1}}\]From this, a distribution normalizing constant \(g_{d,t}\) can be estimated as
\[g_{d,t} = \ln \left ( \frac{\sigma_{d,t}}{K \cdot \varepsilon_{d}} \right )\]Where \(K = \frac{1}{\sqrt{2\Pi}}\), \(\varepsilon_{d}\) is the least count of measurement of the descriptor \(d\). i.e. \(x\) is measured to an accuracy of \(\pm\varepsilon_{d}\).
The probability of getting a measurement \(x\) from a distribution of mean \(\mu\) and variance \(\sigma\) is approximately as follows.
\[K \cdot \frac{\varepsilon_{d}}{\sigma_{d,t}} \cdot e^{\frac{-(x_{d,s} - \mu_{d,t})^2}{2 \sigma_{d,t}^{2}}}\]Now the length of the information code that can optimally indicate the possession of a value \(x\) by the trait \(d\) is computed as follows:
\[c_{x,d,t} = g_{d,t} + \frac{(x_{d,s} - \mu_{d,t})^2}{2 \sigma_{d,t}^{2}}\]Value
A data frame with 2 columns:
- x
The quantitative trait data
- inflen
Information length computed
References
Balakrishnan R, Nair NV (2003).
“Strategies for developing core collections of sugarcane (Saccharum officinarum L.) germplasm-comparison of sampling from diversity groups constituted by three different methods.”
Plant Genetic Resources Newsletter, 134, 33–41.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.”
Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Balakrishnan R, Suresh KK (2001).
“Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part III. Obtaining diversity groups based on an information measure.”
Indian Journal of Plant Genetic Resources, 14(3), 342–349.
Wallace CS, Boulton DM (1968).
“An information measure for classification.”
The Computer Journal, 11(2), 185–194.
See Also
Examples
suppressPackageStartupMessages(library(EvaluateCore))
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
# Data of 'Average plant weight' quantitative trait
AVPW <- cassava_EC$AVPW
# Compute information length
AVPWinflen <- inflen.quant(x = AVPW, mean = 4, sd = 3.25, e = 1)
head(AVPWinflen)
Relative Frequency Adjustments
Description
Compute and transform relative frequencies for a qualitative trait in a germplasm collection by the following methods (Balakrishnan and Suresh 2001):
Square root-proportion
-
Log-frequency
Usage
prop.adj(x, method = c("none", "log", "sqrt"), size.count = NULL)
Arguments
x |
Data of a qualitative trait for accessions in a collection as a vector of type factor. |
method |
The method for transformation. Either |
size.count |
A positive integer specifying the target size of the core
collection. The sum of frequencies allocated across levels of each
qualitative trait will not exceed this value, and serves as the upper bound
for iterative proportion clamping when |
Details
If \(p_{i}\) is the relative frequency of the \(i\)th descriptive state for a qualitative trait in a collection, then the square root-proportion transformed relative \(q_{i}\) is computed as
\[q_{i} = \frac{\sqrt{p_{i}}}{\sum_{i=1}^{s}\sqrt{p_{i}}}\]Where \(s\) is the number of possible descriptor states for the qualitative trait in the collection.
Similarly, the log-frequency transformed relative \(q_{i}\) is computed as
\[q_{i} = \frac{\log(F_{i} + k)}{\sum_{i=1}^{s}\log(F_{i} + k)}\]Where \(F_{i}\) is the absolute frequency of the \(i\)th
descriptive state for a qualitative trait in a collection. It is incremented
by a constant \(k = 0.000001\) prior to log transformation. This ensures
that singleton descriptor states (where \(F_{i} = 1\)) yield a small but
non-zero proportion rather than being assigned a zero proportion due to
\(\log(1) = 0\), which would otherwise exclude all accessions of that
descriptor state from core selection irrespective of size.count.
When size.count is supplied, the transformed proportions
\(q_{i}\) are subject to iterative clamping to ensure that the implied
frequency \(q_{i} \times n\) for any descriptor state \(i\) does
not exceed its actual count in the collection, where \(n\) is
size.count. Excess proportion from clamped states is redistributed
proportionally among unclamped states and the process repeats until no state
exceeds its maximum allowable proportion \(F_{i} / n\).
Value
The relative frequencies as a named numeric vector.
References
Balakrishnan R, Suresh KK (2001). “Strategies for developing core collections of safflower (Carthamus tinctorius L.) germplasm-part II. Using an information measure for obtaining a core sample with pre-determined diversity levels for several descriptors simultaneously.” Indian Journal of Plant Genetic Resources, 14(1), 32–42.
Examples
suppressPackageStartupMessages(library(EvaluateCore))
library(EvaluateCore)
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
# Data of 'Colour of unexpanded apical leaves' qualitative trait
CUAL <- as.factor(cassava_EC$CUAL)
# Raw relative frequencies
prop.adj(CUAL, method = "none")
# Square root-proportion transformed relative frequencies
prop.adj(CUAL, method = "sqrt")
# Square log-frequency transformed relative frequencies
prop.adj(CUAL, method = "log")
Round Numeric Values While Preserving a Target Sum
Description
Applies the Hamilton (largest remainder or Hare-Niemeyer or Vinton) rounding method (Balinski and Young 2001) to a numeric vector so that the rounded values sum to a specified target.
Usage
round_preserve_sum(x, target = round(sum(x)))
Arguments
x |
A numeric vector to round. |
target |
A numeric scalar giving the desired sum of the rounded values.
Defaults to |
Details
Values are first rounded down using floor(), and the remaining deficit
is allocated to elements with the largest fractional parts.
Value
An numeric vector of the same length as x, where the elements
sum to target.
References
Balinski ML, Young HP (2001). Fair Representation: Meeting the Ideal of One Man, One Vote. Brookings Institution Press. ISBN 978-0-8157-0111-8.
Examples
round_preserve_sum(c(1.2, 2.7, 3.5))
round_preserve_sum(c(10.4, 10.4, 10.2), target = 32)