--- title: "Countable Histograms with `gf_squareplot()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Countable Histograms with `gf_squareplot()`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, dpi = 96 ) ``` ```{r setup, message = FALSE} library(coursekata) ``` ## Overview `gf_squareplot()` creates histograms where individual data points are visible as stacked unit rectangles. Instead of abstract bars, each observation becomes a countable square, making sample size and distribution shape tangible. This is particularly useful for teaching statistical concepts like sampling distributions and hypothesis testing, where students benefit from seeing that "n = 47" means 47 actual squares. ## Basic Usage Pass a formula and data frame, just like other `gf_*` functions: ```{r basic} gf_squareplot(~Thumb, data = Fingers) ``` ## Display Modes The `bars` parameter controls how the histogram is displayed: - `"none"` (default): Individual squares only - `"outline"`: Squares with bar outlines around each bin - `"solid"`: Traditional filled bars ```{r bars-outline} gf_squareplot(~Thumb, data = Fingers, bars = "outline") ``` ## Customizing Appearance You can customize fill color, `binwidth`, and axis limits: ```{r custom} gf_squareplot(~Thumb, data = Fingers, fill = "coral", binwidth = 5, xrange = c(30, 90)) ``` ## Integer Data For integer-valued data with a small range, `gf_squareplot()` automatically selects a `binwidth` of 1, so each integer gets its own column: ```{r integer} int_data <- data.frame(rolls = sample(1:6, 30, replace = TRUE)) gf_squareplot(~rolls, data = int_data) ``` ## Large Samples When any bin has more than 75 observations, the function automatically switches to solid bars to keep the display readable. You can opt into subdivision instead with `auto_subdivide = TRUE`, which splits wide bins into sub-columns so rectangles remain countable: ```{r large-sample} large_data <- data.frame(x = rnorm(500, mean = 50, sd = 10)) gf_squareplot(~x, data = large_data) ``` ## Teaching Features ### Mean Line Show a dashed line at the sample mean: ```{r mean-line} gf_squareplot(~Thumb, data = Fingers, show_mean = TRUE) ``` ### DGP Overlay The `show_dgp = TRUE` option adds a teaching overlay for hypothesis testing contexts. It shows: - A top axis labeled "Population Parameter (DGP)" with the population model equation - A bottom axis labeled "Parameter Estimate" with the sample estimate equation - A red triangle and label marking the null hypothesis position (b1 = 0) ```{r dgp, fig.height = 5} set.seed(42) samp_dist <- do(100) * b1(Thumb ~ Height, data = sample(Fingers, 30)) gf_squareplot(~b1, data = samp_dist, show_dgp = TRUE, show_mean = TRUE, xrange = c(-0.5, 1.5), xbreaks = seq(-0.5, 1.5, by = 0.25)) ``` ## Factor Input When the input is a factor with numeric levels, all levels are displayed on the x-axis even if some have zero counts: ```{r factor} ratings <- factor(sample(1:5, 20, replace = TRUE, prob = c(1, 2, 4, 2, 1)), levels = 1:5) df <- data.frame(rating = ratings) gf_squareplot(~rating, data = df) ```