--- title: "Importing a database from Nominatim" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Importing a database from Nominatim} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include=FALSE} library(photon) ``` As specified in the [introduction vignette](photon.html), you can download pre-built search indices for selected country extracts. If you require more freedom in providing the geocoding data, you can choose to import from an existing Nominatim database. Importing from Nominatim is also a requirement if you want to enable structured geocoding queries. This vignette guides you through the setup and import of an external Nominatim database as well as the setup of structured geocoding support through OpenSearch-based photon. Technically, Nominatim databases can only be reliably set up on Linux systems. Here, we use the `mediagis/nominatim` docker image to set up Nominatim irrespective of the operating system. You can use the helper functions `cmd_options()` and `run()` to run a Nominatim docker. It is important to expose the port 5432 on the host machine, otherwise photon is not able to connect to the database. ```{r, eval=FALSE} opts <- cmd_options( e = "PBF_URL=https://download.geofabrik.de/australia-oceania/samoa-latest.osm.pbf", e = "NOMINATIM_PASSWORD=MNdtC2*pP#aMbe", e = "FREEZE=true", p = "8080:8080", p = "5432:5432", name = "nominatim", "mediagis/nominatim:4.4", use_double_hyphens = TRUE ) nominatim <- process$new("docker", c("run", opts)) ``` To verify that the database can be connected to, you can connect to it from R. ```{r, eval=FALSE} library(RPostgres) db <- dbConnect(Postgres(), password = "MNdtC2*pP#aMbe", user = "nominatim") dbGetInfo(db) #> $dbname #> [1] "nominatim" #> #> $host #> [1] "localhost" #> #> $port #> [1] "5432" #> #> $username #> [1] "nominatim" #> #> $protocol.version #> [1] 3 #> #> $server.version #> [1] 140013 #> #> $db.version #> [1] 140013 #> #> $pid #> [1] 604 dbDisconnect(db) ``` If the database can be connected to, you can start a new photon instance and import the database using `$import()`. The database import creates the folder `photon_data` inside the given photon directory. ```{r, eval=FALSE} dir <- file.path(tempdir(), "photon") photon <- new_photon(dir, overwrite = TRUE) #> ℹ java version "22" 2024-03-19 #> ℹ Java(TM) SE Runtime Environment (build 22+36-2370) #> ℹ Java HotSpot(TM) 64-Bit Server VM (build 22+36-2370, mixed mode, sharing) #> ✔ Successfully downloaded photon 0.5.0. [8.2s] #> ℹ No search index downloaded! Download one or import from a Nominatim database. #> • Version: 0.5.0 photon$import(host = "localhost", password = "MNdtC2*pP#aMbe") #> 2024-10-24 23:07:35,904 [main] WARN org.elasticsearch.node.Node - version [5.6.16-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production #> 2024-10-24 23:07:43,326 [main] INFO de.komoot.photon.elasticsearch.Server - Started elastic search node #> 2024-10-24 23:07:43,326 [main] INFO de.komoot.photon.App - Make sure that the ES cluster is ready, this might take some time. #> 2024-10-24 23:07:43,905 [main] INFO de.komoot.photon.App - ES cluster is now ready. #> 2024-10-24 23:07:45,299 [main] INFO de.komoot.photon.App - Starting import from nominatim to photon with languages: en,fr,de,it #> 2024-10-24 23:07:45,300 [main] INFO de.komoot.photon.nominatim.NominatimConnector - Start importing documents from nominatim (global) #> 2024-10-24 23:07:59,080 [main] INFO de.komoot.photon.nominatim.ImportThread - Finished import of 2085 photon documents. #> 2024-10-24 23:07:59,080 [main] INFO de.komoot.photon.App - Imported data from nominatim to photon with languages: en,fr,de,it ``` After the import has finished, you can start the photon instance. ```{r, eval=FALSE} photon$start() #> 2024-10-24 23:26:46,360 [main] WARN org.elasticsearch.node.Node - version [5.6.16-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production #> ✔ Photon is now running. [11.1s] ``` ```{r, eval=FALSE} geocode("Apia") #> Simple feature collection with 3 features and 13 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -171.7631 ymin: -13.83613 xmax: -171.7512 ymax: -13.82611 #> Geodetic CRS: WGS 84 #> # A tibble: 3 × 14 #> idx osm_type osm_id country osm_key city street countrycode osm_value name state type extent #> #> 1 1 W 1322127938 Samoa place NA NA WS city Apia Tuam… city #> 2 1 W 723300892 Samoa landuse Matautu Tai NA WS harbour Apia… Tuam… other #> 3 1 W 666117780 Samoa tourism Levili Levili St… WS attracti… Apia… Tuam… house #> # ℹ 1 more variable: geometry ``` ## OpenSearch Previous setups of photon were based on the ElasticSearch search engine. Photon also offers a version that is based on [OpenSearch](https://opensearch.org/). This photon version is necessary to enable structured geocoding queries. Since photon 0.6.0, OpenSearch jar files are provided with new photon releases and can be downloaded by setting `opensearch = TRUE` in the `new_photon()` function. ```{r, eval=FALSE} # set opensearch = TRUE to use OpenSearch photon photon <- new_photon(dir, opensearch = TRUE, quiet = TRUE) # set structured = TRUE to enable structured geocoding photon$import(host = "localhost", password = "MNdtC2*pP#aMbe", structured = TRUE) ``` Again, the `$import()` method created a directory `photon_data`, which is 10-20% larger than the ElasticSearch version. After starting photon, you can verify that structured geocoding is enabled by running `has_structured_support()`. ```{r, eval=FALSE} photon$start() has_structured_support() #> [1] TRUE ``` Since structured geocoding now works, you can now send entire datasets with structured address data. Structured geocoding allows you to search for specific elements of an address instead of passing free text queries. In the following example, we search for three different spatial features by querying their state, street, and housenumber. Photon is able to geocode them with very high precision due to the detailed data structure we provide. ```{r, eval=FALSE} place_data <- data.frame( housenumber = c(NA, "77C", NA), street = c("Falealilli Cross Island Road", "Main Beach Road", "Le Mafa Pass Road"), state = c("Tuamasaga", "Tuamasaga", "Atua") ) structured(place_data, limit = 1) #> Simple feature collection with 3 features and 14 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -171.7759 ymin: -14.04544 xmax: -171.451 ymax: -13.8338 #> Geodetic CRS: WGS 84 #> # A tibble: 3 × 15 #> idx osm_type osm_id country osm_key city countrycode osm_value name state type extent housenumber street #> #> 1 1 W 319147189 Samoa highway Siumu Uta WS primary Faleal… Tuam… stre… NA NA #> 2 2 W 569855981 Samoa building Apia WS yes NA Tuam… house 77C Main … #> 3 3 W 40681149 Samoa highway Lalomanu WS primary Main S… Ātua stre… NA NA #> # ℹ 1 more variable: geometry ```