scholid

R-CMD-check Codecov test coverage CRAN since CRAN downloads CRAN downloads

scholid provides lightweight, dependency-free utilities for working with scholarly identifiers in R. The package is designed as a small, well-tested foundation that can be safely reused by other packages and data workflows. It supports twenty identifier types — see Scope and scholid_types().

See the full documentation at the scholid website.

For online lookup, conversion, metadata retrieval, and linked identifier discovery, see scholidonline.

Installation

Install the released version from CRAN:

install.packages("scholid")

Scope

The package focuses on common identifier systems used in scholarly communication:

Interface

User-available functions:

Function Purpose
scholid_types() List supported scholarly identifier types
is_scholid(x, type) Test whether values conform to a given identifier type
normalize_scholid(x, type) Normalize identifiers to canonical form
extract_scholid(text, type) Extract identifiers of a given type from free text
classify_scholid(x) Guess the identifier type of each input value
detect_scholid_type(x) Detect identifier types from canonical or wrapped input values

Examples

# list supported scholarly identifier types
scholid::scholid_types()
##  [1] "doi"        "arxiv"      "bibcode"    "openalex"   "swhid"     
##  [6] "ark"        "isni"       "orcid"      "ror"        "rrid"      
## [11] "uniprot"    "refseq"     "sra"        "geo"        "bioproject"
## [16] "assembly"   "isbn"       "issn"       "pmcid"      "pmid"
# test whether values match a given identifier type
scholid::is_scholid(
  x    = "10.1000/182",
  type = "doi"
)
## [1] TRUE
# normalize identifiers to canonical form
scholid::normalize_scholid(
  x    = "https://doi.org/10.1000/182",
  type = "doi"
)
## [1] "10.1000/182"
# extract identifiers of a given type from free text
scholid::extract_scholid(
  text = "See https://doi.org/10.1000/182 for details.",
  type = "doi"
)
## [[1]]
## [1] "10.1000/182"
# classify the identifier type of each input value
scholid::classify_scholid(
  x = c(
    "10.1000/182",
    "0000-0002-1825-0097",
    "not an id"
  )
)
## [1] "doi"   "orcid" NA
# detect identifier types from canonical or wrapped input values
scholid::detect_scholid_type(
  x = c(
    "https://doi.org/10.1000/182",
    "ORCID: 0000-0002-1825-0097",
    "arXiv:2101.00001",
    "not an id"
  )
)
## [1] "doi"   "orcid" "arxiv" NA

For more detailed usage patterns check out the Get started vignette.

License

MIT