--- title: "Introduction to {photon}" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to {photon}} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include=FALSE} library(photon) options(photon_movers = FALSE) ``` This vignette is an introduction to the `{photon}` package, an interface to the [photon](https://photon.komoot.io/) geocoder developed by [komoot](https://www.komoot.com/). Photon is open-source, based on [OpenStreetMap](https://www.openstreetmap.org/) data, and powered by the [ElasticSearch](https://www.elastic.co/elasticsearch) search engine. It is -- according to komoot -- fast, scalable, multilingual, typo-tolerant, and up-to-date. Photon can do unstructured geocoding, reverse geocoding, and (under special circumstances) structured geocoding. Komoot offers a public photon API (https://photon.komoot.io/) but you can also set up a photon instance on a local machine. `{photon}` supports both online and offline geocoding. Online geocoding through komoots public API is intriguing because it is convenient and offers up-to-date global coverage. It is appropriately easy to use online geocoding in `{photon}`. First, it is necessary to tell R that you want to use the public API. This can be done using the workhorse function `new_photon()`. To set up online geocoding, simply call it without parameters: ```{r public_api, eval=FALSE} new_photon() #> #> Type : remote #> Server : https://photon.komoot.io/ ``` The created `photon` object is attached to the session and does not have to be stored manually. Now you can geocode. ```{r geocode, eval=FALSE} cities1 <- geocode(c("Sanaa", "Caracas"), osm_tag = ":city") cities1 #> Simple feature collection with 2 features and 12 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -66.9146 ymin: 10.50609 xmax: 44.20588 ymax: 15.35386 #> Geodetic CRS: WGS 84 #> # A tibble: 2 × 13 #> idx osm_type osm_id country osm_key countrycode osm_value name county state type extent geometry #> #> 1 1 N 2.58e8 Yemen place YE city Sana… At Ta… Aman… dist… (44.20588 15.35386) #> 2 2 R 1.12e7 Venezu… place VE city Cara… Munic… Capi… city (-66.9146 10.50609) ``` Similarly, you can also reverse geocode. `{photon}` fully supports `sf` objects so that all geocoding functions return `sf` dataframes and `reverse()` accepts `sf` and `sfc` objects as input. ```{r reverse, eval=FALSE} cities2 <- reverse(cities1, osm_tag = ":city") cities2 #> Simple feature collection with 2 features and 12 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -66.9146 ymin: 10.50609 xmax: 44.20588 ymax: 15.35386 #> Geodetic CRS: WGS 84 #> # A tibble: 2 × 13 #> idx osm_type osm_id country osm_key countrycode osm_value name county state type extent geometry #> #> 1 1 N 2.58e8 Yemen place YE city Sana… At Ta… Aman… dist… (44.20588 15.35386) #> 2 2 R 1.12e7 Venezu… place VE city Cara… Munic… Capi… city (-66.9146 10.50609) ``` ```{r compare, eval=FALSE} all.equal(cities1, cities2) #> [1] TRUE ``` Online geocoding is nice and it is most likely what you need for basic tasks. But what if online geocoding is not enough? What if you need to geocode a dataset of 200,000 places? What if you need to geocode sensitive information from survey respondents? And what about structured geocoding? # Offline geocoding The photon backend is freely available on the [photon GitHub repository](https://github.com/komoot/photon). With it, you can set up a local instance of photon. Offline geocoding is nice because it is extremely fast, versatile and it doesn't send your potentially sensitive data around the internet. In a lot of cases, offline geocoding is absolutely imperative, yet usually, setting up an offline geocoder can be quite cumbersome. `{photon}` takes over this task! To run photon, you need Java 11 or higher. Setting up local photon also works through `new_photon()`. This time, we pass a path where the necessary files should be stored and a country for which a search index should be downloaded. While global coverage is also possible, the global search index is extremely large (around 80 GB). By default, `new_photon()` downloads a search index tagged with `latest` but it is also possible to query a search index created at a specific date. ```{r local_photon, eval=FALSE} path <- file.path(tempdir(), "photon") photon <- new_photon(path, country = "Samoa") #> ℹ java version "22" 2024-03-19 #> ℹ Java(TM) SE Runtime Environment (build 22+36-2370) #> ℹ Java HotSpot(TM) 64-Bit Server VM (build 22+36-2370, mixed mode, sharing) #> ✔ Successfully downloaded photon 0.5.0. [7s] #> ✔ Successfully downloaded search index. [590ms] #> • Version: 0.5.0 #> • Coverage: Samoa #> • Time: latest ``` The resulting object is an R6 class with a few methods to control the instance. To start photon, run `$start()`. This starts an external java process which can be accessed using the `$proc` attribute. ```{r start, eval=FALSE} photon$start() #> Running java -jar photon-0.5.0.jar -listen-ip 0.0.0.0 -listen-port 2322 #> 2024-10-25 17:04:26,912 [main] WARN org.elasticsearch.node.Node - version [5.6.16-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production #> ✔ Photon is now running. [11s] photon$proc #> PROCESS 'java', running, pid 22744. ``` To check if the service is up and running, you can use `$is_ready()`. ```{r is_ready, eval=FALSE} photon$is_ready() #> [1] TRUE ``` Finally, to properly stop photon after you used it, you can run `$stop()`. You do not actually _need_ to run it manually, because it is (implicitly) executed on two occasions: 1. on garbage collection and 2. when the R session ends and external processes are killed. ```{r stop, eval=FALSE} photon$stop() ``` To compare offline and online geocoding, let's benchmark them by geocoding the Samoan capital Apia: ```{r benchmark1, eval=FALSE} # offline geocoding bench::mark(geocode("Apai", limit = 1), iterations = 25)$median #> [1] 17.1ms ``` ```{r benchmark2, eval=FALSE} # online geocoding new_photon() bench::mark(geocode("Apai", limit = 1), iterations = 25)$median #> [1] 1.05s ``` That is a speed increase by a factor of almost 60 (and possibly more on faster machines)! Finally, to clean up photon, i.e. stop the instance and delete the photon directory, run `$purge()`. ```{r purge, eval=FALSE} photon$purge() #> ℹ Purging an instance kills the photon process and removes the photon directory. #> Continue? (y/N/Cancel) y ```