ympes

ympes provides a collection of lightweight helper functions (imps) both for interactive use and for inclusion within other packages. It’s my attempt to save some functionality that would otherwise get lost in a script somewhere on my computer. To that end it’s a bit of a hodgepodge of things that I’ve found useful at one time or another and, more importantly, remembered to include here!

library(ympes)

Visualising palettes

I often want to quickly see what a palette looks like to ensure I can distinguish the different colours. The imaginatively named plot_palette() thus provides a quick overview

plot_palette(c("#5FE756", "red", "black"))

A plot with 3 rectangular regions, coloured green, red and black.

We can make the plot square(ish) by setting the argument square = TRUE. A nice side effect of this is the automatic adjusting of labels to account for the underlying colour

plot_palette(palette.colors(palette = "R4"), square = TRUE)

A plot of the 8 colours that define the ‘R4’ palette. The plot is divided in to a 3 by 3 square (one square is left blank).

Finding strings

Sometimes you just want to find rows of a data frame where a particular string occurs. greprows() searches for pattern matches within a data frames columns and returns the related rows or row indices. It is a thin wrapper around a subset, lapply and reduce grep() based approach.

dat <- data.frame(
    first = letters,
    second = factor(rev(LETTERS)),
    third = "Q"
)
greprows(dat, "A|b")
#> [1]  2 26

grepvrows() is identical to greprows() except with the default value = TRUE.

grepvrows(dat, "A|b")
first second third
b Y Q
z A Q
greprows(dat,  "A|b", value = TRUE)
first second third
b Y Q
z A Q

greplrows() returns a logical vector (match or not for each row of dat).

greplrows(dat, "A|b", ignore.case = TRUE)
#>  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25]  TRUE  TRUE

Capturing strings

One of my favourite functions in \R is strcapture(). This function allows you to extract the captured elements of a regular expression in to a tabular data structure. Being able to parse input strings from a file to correctly split columns in a data frame in a single function call feels so elegant.

To illustrate this, we generate some synthetic movement data which we pretend to have loaded in from a file. Each entry has the form “Name-Direction-Value” with the first two entries representing character strings and, the last entry, an integer value.

movements <- function(length) {
    x <- lapply(
        list(c("Bob", "Mary", "Rose"), c("Up", "Down", "Right", "Left"), 1:10),
        sample,
        size = length,
        replace = TRUE
    )
    do.call(paste, c(x, sep = "-"))
}

# just a small sample to begin with
(dat <- movements(3))
#> [1] "Rose-Down-5" "Bob-Up-10"   "Rose-Left-5"
pattern <- "([[:alpha:]]+)-([[:alpha:]]+)-([[:digit:]]+)"
proto   <- data.frame(Name = "", Direction = "", Value = 1L)
strcapture(pattern, dat, proto = proto, perl = TRUE)
Name Direction Value
Rose Down 5
Bob Up 10
Rose Left 5

For small (define as you wish) data sets this works fine. Unfortunately as the number of entries increases the performance decays (see https://bugs.r-project.org/show_bug.cgi?id=18728 for a more detailed analysis). fstrapture() attempts to improve upon this by utilising an approach I saw implemented by Toby Hocking in the nc and the function nc::capture_first_vec().

# Now a larger number of strings
dat <- movements(1e5)
(t  <- system.time(r <- strcapture(pattern, dat, proto = proto, perl = TRUE)))
#>    user  system elapsed 
#>   0.786   0.029   0.817 
(t2 <- system.time(r2 <- fstrcapture(dat, pattern, proto = proto)))
#>    user  system elapsed 
#>   0.020   0.000   0.021 
t[["elapsed"]] / t2[["elapsed"]]
#> [1] 38.90476

As well as the improved performance you will notice two other differences between the two function signatures. Firstly, to make things more pipeable, the data parameter x appears before the pattern parameter. Secondly, fstrcapture() works only with Perl-compatible regular expressions.

Combining values for lazy people

cc() is for those of us that get fed up typing quotation marks. It accepts either comma-separated, unquoted names that you wish to quote or, a length one character vector that you wish to split by whitespace. Intended mainly for interactive use only, an example is likely more enlightening than my description

cc(dale, audrey, laura, hawk)
#> [1] "dale"   "audrey" "laura"  "hawk"  
cc("dale audrey laura hawk")
#> [1] "dale"   "audrey" "laura"  "hawk"  

Avoid overwriting data frame columns

Sometimes I find myself needing to add a temporary variable to a data frame without kaboshing a variable already present. new_name() provides a simple wrapper around tempfile() that generates random column names and checks for their suitability. Not normally the sort of thing I’d wrap but I find myself writing the same code a lot so here we are

new_name(mtcars)
#> [1] "new7a105ba2ce89"
new_name(mtcars, 3L)
#> [1] "new7a1075070a36" "new7a102a5e0ac3" "new7a1010eb66f0"

Assertions (Experimental)

Where better place for yet another implementation of bespoke assertion functions than a small helper package!. Motivated by vctrs::vec_assert() but with lower overhead at a cost of less flexibility. The assertion functions in ympes are designed to make it easy to identify the top level calling function whether used within a user facing function or internally. They are somewhat experimental in nature and should be treated accordingly!

Currently implemented are:

assert_character(), assert_chr(), assert_character_not_na(), assert_chr_not_na(), assert_scalar_character(), assert_scalar_chr(), assert_scalar_character_not_na(), assert_scalar_chr_not_na(), assert_string(), assert_string_not_na(),

assert_double(), assert_dbl(), assert_double_not_na(), assert_dbl_not_na(), assert_scalar_double(), assert_scalar_dbl(), assert_scalar_double_not_na(), assert_scalar_dbl_not_na(),

assert_integer(), assert_int(), assert_integer_not_na(), assert_int_not_na(), assert_scalar_integer(), assert_scalar_int(), assert_scalar_integer_not_na(), assert_scalar_int_not_na(), assert_integerish(), assert_whole() assert_scalar_whole(), assert_scalar_integerish(),

assert_logical(), assert_lgl(), assert_logical_not_na(), assert_lgl_not_na(), assert_scalar_logical(), assert_scalar_lgl(), assert_scalar_logical_not_na(), assert_scalar_lgl_not_na(), assert_bool(), assert_boolean(),

assert_list(), assert_data_frame(),

assert_negative(), assert_negative_or_na(), assert_positive(), assert_positive_or_na(), assert_non_negative(), assert_non_negative_or_na(), assert_non_positive(), assert_non_positive_or_na(),

assert_numeric(), assert_num(), assert_numeric_not_na(), assert_num_not_na(), assert_scalar_numeric(), assert_scalar_num(), assert_scalar_numeric_not_na(), assert_scalar_num_not_na(),

assert_between()

Hopefully most of these are self-explanatory but there is some opinionated (currently undocumented) handling of NA so care should be taken to inspect the underlying source code before using.

Currently these assertions return NULL if the assertion succeeds. Otherwise an error of class “ympes-error” (with optional subclass if supplied when calling the assertion).

# Use in a user facing function
fun <- function(i, d, l, chr, b) {
    assert_scalar_int(i)
    TRUE
}
fun(i=1L)
#> [1] TRUE
try(fun(i="cat"))

# Use in an internal function
internal_fun <- function(a) {
    assert_string(
        a,
        .arg = deparse(substitute(x)),
        .call = sys.call(-1L),
        .subclass = "example_error"
    )
    TRUE
}
external_fun <- function(b) {
    internal_fun(a=b)
}

external_fun(b="cat")
#> [1] TRUE
try(external_fun(b = letters))
tryCatch(external_fun(b = letters), error = class)
#> [1] "example_error" "ympes-error"   "error"         "condition"