--- title: "myrror" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{myrror} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) # build_rmd("vignettes/myrror.Rmd") ``` ```{r setup} library(myrror) ``` ## Overview `{myrror}` is a package that helps you compare two data frames. It has three types of functions: 1. **Complete comparison**: `myrror()`, the "workhorse" function which will go through a complete comparison of two datastes. 2. **Partial comparison**: - `compare_type()`: Compares the types of the columns in the two data frames. - `compare_values()`: Compares the values of the columns in the two data frames. 3. **Extract differences**: two functions which allow you to access the rows that differ in values between the two data frames. - `extract_diff_values()`: Returns the differences in a list format, one table per variable. - `extract_diff_table()`: Returns the differences in a table format, all variables. For all of this functions, the results will be displayed interactively in the console. ALternatively, the user has the option to return the results as an object to use in further analysis. ## Workflow Example 1 One use case of the package is to check whether two versions of the same dataset have any discrepancies. If so, the objective of `myrror` is to help the user to quickly identify which variables, values or observations are different. In this example, we will compare one simulated dataset `survey_data` with a second version, `survey_data_2`. Each row is uniquely identified by two variables, `country` and `year`: ```{r} survey_data ``` The second dataset is just a variation of the first one: ```{r} survey_data_2 ``` And it clearly has different values in `variable2`: ```{r} all.equal(survey_data, survey_data_2) ``` `myrror()` helps the user diagnosing this difference. To get a complete report of the differences between `survey_data` and `survey_data_2` the user can run: ```{r} myrror(survey_data, survey_data_2, by = c("country", "year")) ``` By setting the `by` argument, the user can specify the unique keys that identify each row. This is useful when the user wants to compare two datasets that have the same structure but different order or number of rows. Once the user identifies the differences, they can use the `extract_diff_values()` function to extract the rows that differ in values between the two data frames: ```{r} extract_diff_values() ``` The function `extract_diff_values()` operates on the comparison object created by `myrror()` and stored in the environment. If the user wants to extract the differences from a different comparison, one can simply re-run `extract_diff_values()` with the appropriate arguments, for example: ```{r} extract_diff_values(survey_data, survey_data_2, by = c("country", "year")) ``` ## Workflow Example 2 Again, we want to compare two datasets: `survey_data` and `survey_data_4`: ```{r} all.equal(survey_data, survey_data_4) ``` These two datasets have a different number of rows. `myrror()` can help by identifying the different number of rows, while also providing a complete report of the differences between the two datasets. The user can then extract the rows missing from one of the two datasets: ```{r} extract_diff_rows(survey_data, survey_data_4, by = c("country", "year")) ```