---
title: "Execution Plans and Streaming Workflows"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Execution Plans and Streaming Workflows}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

Large exact jobs are not only about arithmetic. They are also about controlling
how much work is done per block and where the results are stored. `bigKNN`
addresses those two questions with:

- execution plans, which turn a memory budget into a reproducible block size
- streaming APIs, which write results into destination `big.matrix` objects
  instead of returning everything as dense R objects

This vignette uses a moderately sized toy example so the code stays readable,
but the same patterns scale to much larger references and query batches.

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

if (!requireNamespace("bigmemory", quietly = TRUE)) {
  cat("This vignette requires the 'bigmemory' package.\n")
  knitr::knit_exit()
}

library(bigKNN)
library(bigmemory)
```

```{r helpers, include=FALSE}
knn_table <- function(result, query_ids) {
  do.call(rbind, lapply(seq_along(query_ids), function(i) {
    data.frame(
      query = query_ids[i],
      rank = seq_len(result$k),
      neighbor = result$index[i, ],
      distance = signif(result$distance[i, ], 5),
      row.names = NULL
    )
  }))
}

radius_slice_table <- function(index, distance, offset, query_ids, i) {
  start <- as.integer(offset[i])
  end <- as.integer(offset[i + 1L] - 1L)

  if (start > end) {
    return(data.frame(
      query = query_ids[i],
      neighbor = integer(0),
      distance = numeric(0)
    ))
  }

  data.frame(
    query = query_ids[i],
    neighbor = as.integer(index[start:end]),
    distance = signif(as.numeric(distance[start:end]), 5),
    row.names = NULL
  )
}
```

# Why plans and streaming matter for large matrices

For a small exploratory search, the simplest thing is often best:

- call `knn_bigmatrix()`
- get a pair of dense `index` and `distance` matrices back
- keep working in ordinary R objects

That becomes less attractive as the job grows. The exact computation may still
fit, but the output itself can become large:

- `k`-NN output scales like `n_query x k`
- radius output scales with the total number of matches, which may vary a lot
  from one query to the next
- exact search still needs a block size that matches the memory budget you want
  to spend

Execution plans and streaming let you control those two pressure points
directly.

# Build a Repeatable Example

The reference matrix below has 160 rows and 4 columns. It is large enough to
show real changes in derived block size, but still small enough that we can
inspect outputs directly in the vignette.

```{r create-data}
i <- seq_len(160)

reference_matrix <- cbind(
  x1 = i,
  x2 = (i %% 7) + 1,
  x3 = (i %% 11) + 0.5,
  x4 = (i %% 13) + 2
)

reference <- as.big.matrix(reference_matrix)

dense_query <- rbind(
  reference_matrix[5, ] + c(0.2, 0.0, 0.1, 0.0),
  reference_matrix[50, ] + c(-0.3, 0.2, 0.0, 0.1),
  reference_matrix[120, ] + c(0.4, -0.1, 0.2, 0.0),
  reference_matrix[151, ] + c(0.1, 0.2, -0.2, 0.3)
)

query_ids <- paste0("q", seq_len(nrow(dense_query)))

dim(reference_matrix)
dense_query
```

# Building a plan with `knn_plan_bigmatrix()`

`knn_plan_bigmatrix()` converts a target memory budget into a search plan. The
plan stores the metric, the requested thread count, whether progress reporting
should be enabled, and most importantly the derived `block_size`.

```{r build-plan}
plan <- knn_plan_bigmatrix(
  reference,
  metric = "euclidean",
  memory_budget = "64KB",
  num_threads = 2L,
  progress = FALSE
)

plan
```

The printed object is the key summary you usually need:

- `metric`: the exact distance used by plan-aware calls
- `memory_budget`: the requested budget in a normalized form
- `block_size`: the number of rows processed per block
- `num_threads`: the thread request forwarded to common BLAS/OpenMP variables
- `shape`: the reference dimensions the plan was built for

# How memory budget maps to block size

Block size is derived rather than guessed. With the same reference matrix, a
larger budget yields a larger working block.

```{r plan-comparison}
plan_small <- knn_plan_bigmatrix(
  reference,
  metric = "euclidean",
  memory_budget = "4KB",
  num_threads = 2L,
  progress = FALSE
)

plan_large <- knn_plan_bigmatrix(
  reference,
  metric = "euclidean",
  memory_budget = "1MB",
  num_threads = 2L,
  progress = FALSE
)

data.frame(
  memory_budget = c(plan_small$memory_budget, plan$memory_budget, plan_large$memory_budget),
  block_size = c(plan_small$block_size, plan$block_size, plan_large$block_size),
  row.names = NULL
)
```

This is useful even though the search remains exact. The plan does not change
which neighbours are returned. It changes how much data is processed at once,
which makes resource use more predictable across runs and machines.

In practice:

- use a smaller budget when memory is tight or when multiple jobs share a host
- use a larger budget when you want fewer, larger blocks
- keep the plan object if you want the same resource policy to be reused across
  multiple calls

# Running planned in-memory search

Plans plug into the same search API you would use without them.

```{r planned-search}
planned_knn <- knn_bigmatrix(
  reference,
  query = dense_query,
  k = 3,
  plan = plan,
  exclude_self = FALSE
)

planned_knn
knn_table(planned_knn, query_ids = query_ids)
```

The result contract is unchanged. You still get dense `index` and `distance`
matrices. The difference is that `bigKNN` uses the plan's block size and thread
policy while computing them.

# Streaming `k`-NN output with `knn_stream_bigmatrix()`

When `n_query x k` is large enough that you do not want the full result held in
ordinary R matrices, switch to `knn_stream_bigmatrix()`.

The destination requirements are straightforward:

- `xpIndex` must be a writable `big.matrix` with `n_query` rows and `k` columns
- `xpDistance`, if supplied, must have the same dimensions
- `xpIndex` should use integer or double storage
- `xpDistance` should use double storage

```{r stream-knn}
index_store <- big.matrix(nrow(dense_query), 3, type = "integer")
distance_store <- big.matrix(nrow(dense_query), 3, type = "double")

streamed_knn <- knn_stream_bigmatrix(
  reference,
  query = dense_query,
  xpIndex = index_store,
  xpDistance = distance_store,
  k = 3,
  plan = plan,
  exclude_self = FALSE
)

bigmemory::as.matrix(streamed_knn$index)
round(bigmemory::as.matrix(streamed_knn$distance), 4)
```

Those streamed outputs match the in-memory result exactly:

```{r stream-knn-compare}
identical(bigmemory::as.matrix(streamed_knn$index), planned_knn$index)
all.equal(bigmemory::as.matrix(streamed_knn$distance), planned_knn$distance)
```

# Streaming radius output with `radius_stream_bigmatrix()`

Radius search needs one extra step before streaming: you have to know how much
space to allocate for the flattened matches. The simplest pattern is:

1. call `count_within_radius_bigmatrix()` to get per-query counts
2. allocate `sum(counts)` rows for the flattened `index` and `distance`
3. allocate `length(counts) + 1` rows for the offset vector
4. run `radius_stream_bigmatrix()`

```{r stream-radius-counts}
radius_counts <- count_within_radius_bigmatrix(
  reference,
  query = dense_query,
  radius = 2.2,
  plan = plan,
  exclude_self = FALSE
)

radius_counts
total_matches <- sum(radius_counts)
total_matches
```

```{r stream-radius}
radius_index_store <- big.matrix(total_matches, 1, type = "integer")
radius_distance_store <- big.matrix(total_matches, 1, type = "double")
radius_offset_store <- big.matrix(length(radius_counts) + 1L, 1, type = "double")

streamed_radius <- radius_stream_bigmatrix(
  reference,
  query = dense_query,
  xpIndex = radius_index_store,
  xpDistance = radius_distance_store,
  xpOffset = radius_offset_store,
  radius = 2.2,
  plan = plan,
  exclude_self = FALSE
)

streamed_radius
streamed_radius$n_match
```

The offset vector is what makes flattened radius output usable. It tells you
where each query's matches begin and end.

```{r stream-radius-offsets}
radius_offset <- as.vector(bigmemory::as.matrix(streamed_radius$offset))
radius_index <- as.vector(bigmemory::as.matrix(streamed_radius$index))
radius_distance <- as.vector(bigmemory::as.matrix(streamed_radius$distance))

radius_offset
radius_slice_table(radius_index, radius_distance, radius_offset, query_ids, 1)
radius_slice_table(radius_index, radius_distance, radius_offset, query_ids, 2)
```

If you prefer, you can skip the streamed route and use `radius_bigmatrix()` to
have `bigKNN` allocate the flattened vectors for you. Streaming becomes useful
when those flattened vectors are large or when you want explicit control over
their storage.

# Dense versus sparse query inputs

The public API also accepts sparse query matrices. At the moment they are
densified on the R side before exact computation, so sparse input is primarily
an interface convenience rather than a sparse-native backend.

```{r sparse-queries}
sparse_query <- Matrix::Matrix(dense_query, sparse = TRUE)

sparse_knn <- knn_bigmatrix(
  reference,
  query = sparse_query,
  k = 3,
  plan = plan,
  exclude_self = FALSE
)

identical(sparse_knn$index, planned_knn$index)
all.equal(sparse_knn$distance, planned_knn$distance)
```

That makes sparse queries especially handy when your upstream workflow already
produces `Matrix` objects, and you want to use the same exact search call
without manually converting first.

# Practical guidance for choosing output modes

A simple rule of thumb works well:

- use `knn_bigmatrix()` when `n_query x k` is small enough that ordinary R
  matrices are convenient
- use `knn_stream_bigmatrix()` when `n_query x k` is large and you want control
  over where those matrices live
- use `radius_bigmatrix()` when flattened radius output is still manageable in
  memory
- use `radius_stream_bigmatrix()` when the total number of matches is large or
  highly variable across query rows
- use `knn_plan_bigmatrix()` whenever you want repeatable memory and threading
  policy instead of ad hoc defaults

Plans and streaming do not make the search approximate. They make the exact
workflow more operationally predictable, which is often the difference between
an algorithm that works in principle and one that is pleasant to run at scale.