The hetu R package provides tools to work with Finnish personal identity numbers (hetu, short for the Finnish term “henkilötunnus”). Some functions can also be used with Finnish Business ID numbers (y-tunnus).
Where possible, we have unified the syntax with sweidnumbr.
Install the current devel version in R:
::install_github("ropengov/hetu") devtools
Test the installation by loading the library:
library(hetu)
We also recommend setting the UTF-8 encoding:
Sys.setlocale(locale="UTF-8")
Finnish personal identification numbers (Finnish: henkilötunnus, hetu in short), are used to identify citizens. Hetu PIN consists of eleven characters: DDMMYYCZZZQ, where DDMMYY is the day, month and year of birth, C is the century marker, ZZZ is the individual number and Q is the control character.
Males have odd and females have even individual number. The control character is determined by dividing DDMMYYZZZ by 31 and using the remainder (modulo 31) to pick up the corresponding character from the string “0123456789ABCDEFHJKLMNPRSTUVWXY”. For example, if the remainder is 0, the control character is 0 and if the remainder is 12, the control character is C.
A valid individual number is between 002-899. Individual numbers 900-999 are not in normal use and are used only for temporary or artificial PINs. These temporary PINs are sometimes used in different organizations, such as insurance companies or hospitals, if the individual is not a Finnish citizen, a permanent resident or if the exact identity of the individual cannot be determined at the time. Artificial or temporary PINs are not intended for continuous, long term use and they are not usually accepted by PIN validity checking algorithms.
Temporary PINs provide similar information about individual’s birth date or sex as regular PINs. Temporary PINs can also be safely used for testing purposes, as such a number cannot be linked to any real person.
The basic hetu function can be used to view information included in a Finnish personal identification number. The data is outputted as a data frame.
<- "111111-111C"
example_pin hetu(example_pin)
#> hetu sex p.num ctrl.char date day month year century valid.pin
#> 1 111111-111C Male 111 C 1911-11-11 11 11 1911 - TRUE
The output can be made prettier, for example by using knitr:
::kable(hetu(example_pin)) knitr
hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin |
---|---|---|---|---|---|---|---|---|---|
111111-111C | Male | 111 | C | 1911-11-11 | 11 | 11 | 1911 | - | TRUE |
The hetu function also accepts vectors with several identification numbers as input:
<- c("010101-0101", "111111-111C")
example_pins ::kable(hetu(example_pins)) knitr
hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin |
---|---|---|---|---|---|---|---|---|---|
010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE |
111111-111C | Male | 111 | C | 1911-11-11 | 11 | 11 | 1911 | - | TRUE |
The hetu function does not print warning messages to the user if input vector contains invalid PINs. Validity of specific PINs can be determined by looking at the valid.pin column.
hetu(c("010101-0102", "111311-111C", "010101-0101"))
#> hetu sex p.num ctrl.char date day month year century
#> 1 010101-0102 Female 010 2 1901-01-01 1 1 1901 -
#> 2 111311-111C Male 111 C <NA> 11 NA 1911 -
#> 3 010101-0101 Female 010 1 1901-01-01 1 1 1901 -
#> valid.pin
#> 1 FALSE
#> 2 FALSE
#> 3 TRUE
Information contained in the PIN can be extracted with a generic extract parameter. Valid values for extraction are hetu, sex, personal.number, ctrl.char, date, day, month, year, century, valid.pin and is.temp.
is.temp can be extracted only if allow.temp is set to TRUE. If allow.temp is set to FALSE (default), temporary PINs are filtered from the output and information provided by is.temp would be meaningless.
hetu(example_pins, extract = "sex")
#> [1] "Female" "Male"
hetu(example_pins, extract = "ctrl.char")
#> [1] "1" "C"
Some fields can be extracted with specialized functions. Extracting sex with hetu_sex function:
hetu_sex(example_pins)
#> [1] "Female" "Male"
Extracting age at current date and at a given date with hetu_age function:
hetu_age(example_pins)
#> The age in years has been calculated at 2022-05-20.
#> [1] 121 110
hetu_age(example_pins, date = "2012-01-01")
#> The age in years has been calculated at 2012-01-01.
#> [1] 111 100
hetu_age(example_pins, timespan = "months")
#> The age in months has been calculated at 2022-05-20.
#> [1] 1456 1326
Dates (birth dates) also have their own function, hetu_date.
hetu_date(example_pins)
#> [1] "1901-01-01" "1911-11-11"
The basic hetu function output includes information on the validity of each pin, which can be extracted by using hetu-function with valid.pin as extract parameter.
The validity of the PINs can also be determined by using the hetu_ctrl function, which produces a vector:
hetu_ctrl(c("010101-0101", "111111-111C")) # TRUE TRUE
#> [1] TRUE TRUE
hetu_ctrl("010101-1010") # FALSE
#> [1] FALSE
The package functions can be made to accept artificial or temporary personal identification numbers. Artificial and temporary PINs can be used normally by allowing them through allow.temp parameter.
<- "010101A900R"
example_temp_pin ::kable(hetu(example_temp_pin, allow.temp = TRUE)) knitr
hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | is.temp |
---|---|---|---|---|---|---|---|---|---|---|
010101A900R | Female | 900 | R | 2001-01-01 | 1 | 1 | 2001 | A | TRUE | TRUE |
A vector with regular and temporary PINs mixed together prints only regular PINs, if allow.temp is not set to TRUE. Automatic omitting of temporary PINs does not produce a visible error message and therefore users need to be cautious if they want to use temporary PINs.
If temporary PINs are not explicitly allowed and the input vector consists of temporary PINs only, the function will return an error.
<- c("010101A900R", "010101-0101")
example_temp_pins hetu_ctrl("010101A900R", allow.temp = FALSE)
#> [1] NA
::kable(hetu(example_temp_pins)) knitr
hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | |
---|---|---|---|---|---|---|---|---|---|---|
2 | 010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE |
When allow.temp is set to TRUE, all PINs are handled as if they were regular PINs.
::kable(hetu(example_temp_pins, allow.temp = TRUE)) knitr
hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | is.temp |
---|---|---|---|---|---|---|---|---|---|---|
010101A900R | Female | 900 | R | 2001-01-01 | 1 | 1 | 2001 | A | TRUE | TRUE |
010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE | FALSE |
hetu_ctrl("010101A900R", allow.temp = TRUE)
#> [1] TRUE
Validation function hetu_ctrl produces a FALSE for every artificial / temporary PIN, if they are not explicitly allowed.
::kable(hetu(example_temp_pins)) #FALSE TRUE knitr
hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | |
---|---|---|---|---|---|---|---|---|---|---|
2 | 010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE |
::kable(hetu(example_temp_pins, allow.temp = TRUE)) #TRUE TRUE knitr
hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | is.temp |
---|---|---|---|---|---|---|---|---|---|---|
010101A900R | Female | 900 | R | 2001-01-01 | 1 | 1 | 2001 | A | TRUE | TRUE |
010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE | FALSE |
Random PINs can be generated by using the rpin function.
rhetu(n = 4)
#> [1] "070502-3401" "030388-1862" "290391-7615" "151219A8600"
rhetu(n = 4, start.date = "1990-01-01", end.date = "2005-01-01")
#> [1] "151190-6358" "040494-121Y" "021297-2170" "280899-296L"
The number of males in the generated sample can be changed with parameter p.male. Default is 0.4.
<- rhetu(n = 4, p.male = 0.8)
random_sample table(random_sample)
#> random_sample
#> 030799+449L 120845-060R 220783-518Y 260661-539R
#> 1 1 1 1
The default proportion of artificial / temporary PINs is 0.0, meaning that no artificial / temporary PINs are generated by default.
<- rhetu(n = 4, p.temp = 0.5)
temp_sample table(hetu(temp_sample, allow.temp = TRUE, extract = "is.temp"))
#>
#> FALSE
#> 4
In addition to information mentioned in the section Extracting specific information, the user can choose to print additional columns containing information about checks done on PINs. The diagnostic checks produce a TRUE or FALSE for the following categories: valid.p.num, valid.checksum, correct.checksum, valid.date, valid.day, valid.month, valid.year, valid.length and valid.century, FALSE meaning that hetu is somehow incorrect.
<- c("010101-0102", "111111-111Q",
diagnosis_example "010101B0101", "320101-0101", "011301-0101",
"010101-01010", "010101-0011")
head(hetu(diagnosis_example, diagnostic = TRUE), 3)
#> hetu sex p.num ctrl.char date day month year century
#> 1 010101-0102 Female 010 2 1901-01-01 1 1 1901 -
#> 2 111111-111Q Male 111 Q 1911-11-11 11 11 1911 -
#> 3 010101B0101 Female 010 1 <NA> 1 1 NA B
#> valid.pin valid.p.num valid.ctrl.char correct.ctrl.char valid.date valid.day
#> 1 FALSE TRUE TRUE FALSE TRUE TRUE
#> 2 FALSE TRUE FALSE FALSE TRUE TRUE
#> 3 FALSE TRUE TRUE TRUE FALSE TRUE
#> valid.month valid.year valid.length valid.century
#> 1 TRUE TRUE TRUE TRUE
#> 2 TRUE TRUE TRUE TRUE
#> 3 TRUE TRUE TRUE FALSE
Diagnostic information can be examined more closely by using subset or by using a separate hetu_diagnostics function. The user can print all diagnostic information for all PINs in the dataset:
tail(hetu_diagnostic(diagnosis_example), 3)
#> hetu is.temp valid.p.num valid.ctrl.char correct.ctrl.char valid.date
#> 5 011301-0101 FALSE TRUE TRUE FALSE FALSE
#> 6 010101-01010 FALSE TRUE TRUE TRUE TRUE
#> 7 010101-0011 FALSE FALSE TRUE FALSE TRUE
#> valid.day valid.month valid.year valid.length valid.century
#> 5 TRUE FALSE TRUE TRUE TRUE
#> 6 TRUE TRUE TRUE FALSE TRUE
#> 7 TRUE TRUE TRUE TRUE TRUE
By using extract parameter, the user can choose which columns will be printed in the output table. Valid extract values are listed in the function’s help file.
hetu_diagnostic(diagnosis_example, extract = c("valid.century", "correct.checksum"))
#> Error in hetu_diagnostic(diagnosis_example, extract = c("valid.century", : Trying to extract invalid diagnostic(s)
Because of the way PINs are handled in inside hetu-function, the diagnostics-function can show unexpected warning messages or introduce NAs by coercion if the date-part of the PIN is too long. This may result in inability to handle the PIN at all!
# Faulty example
hetu_diagnostic(c("01011901-01010"))
The package has also the ability to generate Finnish Business ID codes (y-tunnus) and check their validity. Unlike with personal identification numbers, no additional information can be extracted from Business IDs.
Similar to hetu PINs, random Finnish Business IDs (y-tunnus) can be generated by using rbid function.
<- rbid(3)
bid_sample
bid_sample#> [1] "0991107-0" "8377128-0" "1286283-9"
The validity of Finnish Business Identity Codes can be checked with a similar function to hetu_ctrl: bid_ctrl.
bid_ctrl(c("0737546-2", "1572860-0")) # TRUE TRUE
#> [1] TRUE TRUE
bid_ctrl("0737546-1") # FALSE
#> [1] FALSE
Data frames generated by hetu function work well with tidyverse/dplyr workflows as well.
library(hetu)
library(tidyverse)
library(dplyr)
# Generate data for this example
<-tibble(pin=rhetu(n = 4, start_date = "1990-01-01", end_date = "2005-01-01"))
hdat
# Extract all the hetu information to tibble format
<-hdat %>%
hdatmutate(result=map(.x=pin,.f=hetu::hetu)) %>% unnest(cols=c(result))
hdat
This work can be freely used, modified and distributed under the open license specified in the DESCRIPTION file.
Kindly cite the work as follows
citation("hetu")
#>
#> Kindly cite the hetu R package as follows:
#>
#> Pyry Kantanen, Mans Magnusson, Jussi Paananen and Leo Lahti (rOpenGov
#> 2022). hetu: Structural Handling of Finnish Personal Identity Codes.
#> R package version 1.0.7 URL: http://github.com/rOpenGov/hetu
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Misc{,
#> title = {hetu: Structural Handling of Finnish Personal Identity Codes},
#> author = {Pyry Kantanen and Mans Magnusson and Jussi Paananen and Leo Lahti},
#> url = {https://github.com/rOpenGov/hetu},
#> year = {2022},
#> note = {R package version 1.0.7},
#> }
#>
#> Many thanks for all contributors!
This vignette was created with
sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/fi_FI.UTF-8/fi_FI.UTF-8/C/fi_FI.UTF-8/fi_FI.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] hetu_1.0.7
#>
#> loaded via a namespace (and not attached):
#> [1] lubridate_1.8.0 digest_0.6.29 R6_2.5.1 backports_1.4.1
#> [5] jsonlite_1.8.0 magrittr_2.0.3 evaluate_0.15 highr_0.9
#> [9] stringi_1.7.6 rlang_1.0.2 cli_3.3.0 jquerylib_0.1.4
#> [13] bslib_0.3.1 generics_0.1.2 checkmate_2.1.0 rmarkdown_2.14
#> [17] tools_4.2.0 stringr_1.4.0 parallel_4.2.0 xfun_0.31
#> [21] yaml_2.3.5 fastmap_1.1.0 compiler_4.2.0 htmltools_0.5.2
#> [25] knitr_1.39 sass_0.4.1