---
title: "Getting Started with rmake"
author: "Michal Burda"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with rmake}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r gs-setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
library(rmake)
```

## Introduction

R is a mature scripting language for statistical computations and data processing. An important advantage of R is that it allows writing **repeatable** statistical analyses by programming all steps of data processing in scripts, which allows re-executing the whole process after any change in data or processing steps.

There are several useful packages for R to obtain repeatability of statistical computations, such as `knitr` and `rmarkdown`. These tools allow writing R scripts that generate reports combining text with tables and figures generated from data.

However, if analyses grow in complexity, manual re-execution of the whole process may become tedious, prone to errors, and very demanding computationally. Complex analyses typically involve:

- Many pre-processing steps on large datasets
- Repetitive execution of commands differing only in parameters
- Production of multiple output files in various formats

It is inefficient to re-run all pre-processing steps repeatedly to refresh the final report after any change. A caching mechanism provided by `knitr` is helpful but limited to a single report. Splitting complex analyses into several parts and saving intermediate results into files is rational, but brings another challenge: **management of dependencies** between inputs, outputs, and underlying scripts.

This is where **Make** comes in. Make is a tool that controls the generation of files from source data and script files by reading dependencies from a `Makefile` and comparing timestamps to determine which files need to be refreshed.

The `rmake` package provides tools for easy generation of Makefiles for statistical and data manipulation tasks in R.

## Key Features

The main features of `rmake` are:

- Use of the well-known Make tool
- Easy definitions of file dependencies in the R language
- High flexibility through parameterized execution and programmatic rule generation
- Simple, short code thanks to the `%>>%` pipeline operator and templating
- Support for R scripts and R markdown files
- Extensibility for user-defined rule types
- Isolated and parallel execution via Make's parallel processing
- Support for all platforms: Unix (Linux), MacOS, Windows, and Solaris
- Compatibility with RStudio

## Why Use rmake?

R allows the development of **repeatable** statistical analyses. However, when analyses grow in complexity, manual re-execution on any change may become tedious and error-prone. **Make** is a widely accepted tool for managing the generation of resulting files from source data and script files. `rmake` makes it easy to generate Makefiles for R analytical projects.

## Installation

To install `rmake` from CRAN:

```{r gs-install, eval=FALSE}
install.packages("rmake")
```

Alternatively, install the development version from GitHub:

```{r install_github, eval=FALSE}
install.packages("devtools")
devtools::install_github("beerda/rmake")
```

Load the package:

```{r load}
library(rmake)
```

## Prerequisites

### System Requirements

- **R**: Version 3.5.0 or higher
- **Make**: GNU Make or compatible make tool
  - On Linux/macOS: Usually pre-installed
  - On Windows: Install Rtools (which includes make)

### Environment Variables

The package requires the `R_HOME` environment variable to be properly set. This variable indicates the directory where R is installed and is automatically set when running from within R or RStudio.

#### When is R_HOME needed?

When running `make` from the command line (outside of R), you may need to set `R_HOME` manually.

#### Finding R_HOME

To find the correct value for your system, run this in R:

```{r check_env, eval=FALSE}
R.home()
```

You can also check the current values of R environment variables:

```{r check_env_vars, eval=FALSE}
Sys.getenv("R_HOME")
```

#### Setting R_HOME

**On Linux/macOS:**
```bash
export R_HOME=/usr/lib/R  # Use the path from R.home()
```

**On Windows (Command Prompt):**
```cmd
set R_HOME=C:\Program Files\R\R-4.3.0
```

**On Windows (PowerShell):**
```powershell
$env:R_HOME = "C:\Program Files\R\R-4.3.0"
```

For permanent setup, add the export commands to your shell configuration file (`.bashrc`, `.zshrc`, etc. on Unix-like systems, or system environment variables on Windows).

For more information on R environment variables, see the [official R documentation](https://stat.ethz.ch/R-manual/R-devel/library/base/html/EnvVar.html).

## Project Initialization

### Creating Skeleton Files

To start a new project with `rmake`:

```{r gs-skeleton, eval=FALSE}
library(rmake)
rmakeSkeleton(".")
```

This creates two files:
- `Makefile.R` - R script to generate the Makefile
- `Makefile` - The generated Makefile (initially minimal)

The initial `Makefile.R` contains:
```{r skeleton_content, eval=FALSE}
library(rmake)
job <- list()
makefile(job, "Makefile")
```

## Basic Example

Let's walk through a simple example. Suppose we have:
- `data.csv` - input data file
- `script.R` - R script to process the data
- Output: `sums.csv` - computed results

### Step 1: Create the Data File

Create `data.csv`:
```
ID,V1,V2
a,2,8
b,9,1
c,3,3
```

### Step 2: Create the Processing Script

Create `script.R`:
```{r script_content, eval=FALSE}
d <- read.csv("data.csv")
sums <- data.frame(ID = "sum",
                   V1 = sum(d$V1),
                   V2 = sum(d$V2))
write.csv(sums, "sums.csv", row.names = FALSE)
```

### Step 3: Define the Build Rule

Edit `Makefile.R`:
```{r define_rule, eval=FALSE}
library(rmake)
job <- list(rRule(target = "sums.csv", 
                  script = "script.R", 
                  depends = "data.csv"))
makefile(job, "Makefile")
```

### Step 4: Run the Build

Execute make:
```{r run_make, eval=FALSE}
make()
```

Make will:
1. Regenerate `Makefile` (if `Makefile.R` changed)
2. Execute `script.R` to create `sums.csv`

Subsequent calls to `make()` will do nothing unless files change.

## Using the Pipe Operator

The `%>>%` pipe operator makes rule definitions more readable:

```{r pipe_example, eval=FALSE}
library(rmake)
job <- "data.csv" %>>% 
  rRule("script.R") %>>% 
  "sums.csv"
makefile(job, "Makefile")
```

This is equivalent to the previous example but more concise.

## Adding a Markdown Report

Let's extend our example to create a PDF report. Create `analysis.Rmd`:

````markdown
---
title: "Analysis"
output: pdf_document
---

# Sums of data rows

```{r, echo=FALSE, results='asis'}`r ''`
sums <- read.csv('sums.csv')
knitr::kable(sums)
```
````

Update `Makefile.R`:
```{r add_markdown, eval=FALSE}
library(rmake)
job <- list(
  rRule(target = "sums.csv", script = "script.R", depends = "data.csv"),
  markdownRule(target = "analysis.pdf", script = "analysis.Rmd", 
               depends = "sums.csv")
)
makefile(job, "Makefile")
```

Or using pipes:
```{r pipe_chain, eval=FALSE}
library(rmake)
job <- "data.csv" %>>% 
  rRule("script.R") %>>% 
  "sums.csv" %>>% 
  markdownRule("analysis.Rmd") %>>% 
  "analysis.pdf"
makefile(job, "Makefile")
```

Run make again:
```{r run_make2, eval=FALSE}
make()
```

## Running Make

### From R

```{r make_options, eval=FALSE}
# Run all tasks
make()

# Run specific task
make("all")

# Clean generated files
make("clean")

# Parallel execution (8 jobs)
make("-j8")
```

### From Command Line

```bash
make          # Run all tasks
make clean    # Clean generated files
make -j8      # Parallel execution
```

### From RStudio

1. Go to **Build** > **Configure Build Tools**
2. Set **Project build tools** to **Makefile**
3. Use **Build All** button

## Visualizing Dependencies

Visualize the dependency graph:

```{r gs-visualize, eval=FALSE}
visualize(job, legend = FALSE)
```

This creates an interactive graph showing:
- **Squares**: Data files
- **Diamonds**: Script files  
- **Ovals**: Rules
- **Arrows**: Dependencies

## Multiple Dependencies

Handle complex dependencies:

```{r multiple_deps}
chain1 <- "data1.csv" %>>% rRule("preprocess1.R") %>>% "intermed1.rds"
chain2 <- "data2.csv" %>>% rRule("preprocess2.R") %>>% "intermed2.rds"
chain3 <- c("intermed1.rds", "intermed2.rds") %>>% 
  rRule("merge.R") %>>% "merged.rds" %>>% 
  markdownRule("report.Rmd") %>>% "report.pdf"

job <- c(chain1, chain2, chain3)
```

Alternatively, you can define all chains directly without intermediate variables:

```{r multiple_deps_alt, eval=FALSE}
job <- c(
  "data1.csv" %>>% rRule("preprocess1.R") %>>% "intermed1.rds",
  "data2.csv" %>>% rRule("preprocess2.R") %>>% "intermed2.rds",
  c("intermed1.rds", "intermed2.rds") %>>% 
    rRule("merge.R") %>>% "merged.rds" %>>% 
    markdownRule("report.Rmd") %>>% "report.pdf"
)
```

## Rule Types

`rmake` provides several pre-defined rule types:

- **`rRule()`**: Execute R scripts
- **`markdownRule()`**: Render R Markdown documents
- **`knitrRule()`**: Process knitr documents
- **`copyRule()`**: Copy files
- **`offlineRule()`**: Manual tasks with reminders

For detailed documentation on all rule types including `depRule()`, `subdirRule()`, and custom rules, see the [Build Rules](build-rules.html) vignette.

## Next Steps

For more information on specific topics, see these vignettes:

- [rmake Project Management](project-management.html): Learn about project initialization, running builds, cleaning, and parallel execution
- [Build Rules](build-rules.html): Comprehensive reference for all rule types (rRule, markdownRule, knitrRule, copyRule, depRule, subdirRule, offlineRule)
- [Tasks and Templates](tasks-and-templates.html): Advanced features including tasks, parameterized execution, and rule templates

## Summary

Key takeaways:
1. Use `rmakeSkeleton()` to initialize projects
2. Define rules in `Makefile.R`
3. Use `%>>%` for readable rule chains
4. Run `make()` to execute the build process
5. Use `visualize()` to understand dependencies

## Resources

- Package documentation: `?rmake`
- GitHub: https://github.com/beerda/rmake
- Issues: https://github.com/beerda/rmake/issues