arete: Automated REtrieval from TExt
A Python based pipeline for extraction of species occurrence data through the usage of large language models. Includes validation tools designed to handle model hallucinations for a scientific, rigorous use of LLM. Currently supports usage of GPT with more planned, including local and non-proprietary models. For more details on the methodology used please consult the references listed under each function, such as Kent, A. et al. (1995) <doi:10.1002/asi.5090060209>, van Rijsbergen, C.J. (1979, ISBN:978-0408709293, Levenshtein, V.I. (1966) <https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf> and Klaus Krippendorff (2011) <https://repository.upenn.edu/handle/20.500.14332/2089>.
| Version: | 
0.1 | 
| Depends: | 
R (≥ 4.3.0) | 
| Imports: | 
terra, cld2, stringr, reticulate, pdftools, fedmatch, kableExtra, dplyr, gecko, methods, ggplot2, jsonlite, googledrive, irr, rmarkdown | 
| Suggests: | 
knitr | 
| Published: | 
2025-10-20 | 
| DOI: | 
10.32614/CRAN.package.arete | 
| Author: | 
Vasco V. Branco  
    [cre, aut],
  Vaughn Shirey  
    [ctb],
  Thomas Merrien  
    [ctb],
  Pedro Cardoso  
    [aut] | 
| Maintainer: | 
Vasco V. Branco  <vasco.branco at helsinki.fi> | 
| License: | 
GPL-3 | 
| NeedsCompilation: | 
no | 
| CRAN checks: | 
arete results | 
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=arete
to link to this page.