Combining correspondence tables

Overview

A correspondence table serves as a translation between two statistical classifications. When a correspondence table between two classifications does not yet exist, but both are linked to one or more intermediate classifications through existing correspondence tables, a new correspondence table can be generated automatically.

For the general case, where classifications \(A\) and \(B\) are indirectly linked via one or more intermediate classifications \(C_1, \dots ,C_k\), the newCorrespondenceTable() function can automatically generate a new correspondence table.

A special case occurs when a classification \(A\) is updated to a new version \(A^*\) (with the correspondence table \(A:A^*\) assumed to have been created as part of this update), and a correspondence table \(A:B\) between the old version of \(A\) and another classification of interest \(B\) already exists.

Here, the updateCorrespondenceTable() function can be used to automatically generate the new correspondence table \(A^*:B\). (The newCorrespondenceTable() function could also be applied to achieve this, but the updateCorrespondenceTable() function takes into consideration the fact that \(A\) and \(A^*\) are two versions of the same classification, and is therefore recommended for this updating scenario.

Input

In the case of newCorrespondenceTable(), the number of intermediate classifications is variable.
For this reason, the function accepts a flexible, matrix-like input structure that represents the relationships between classifications and their correspondence tables.

The input must be provided either:

In both cases, the diagonal elements of the structure correspond to classification tables (e.g. \(A\), \(B\), \(C\)), while the off-diagonal elements represent the correspondence tables linking consecutive classifications (e.g. \(A:B\), \(B:C\)).

To generate a correspondence table between classifications \(A\) and \(C\) from the correspondence tables \(A:B\) and \(B:C\), the function requires a matrix-like input structure with classifications on the diagonal and correspondence tables on the off-diagonal. Schematically, this structure can be represented as follows:

\[ \begin{bmatrix} A & A\!:\!B & \\ & B & B\!:\!C \\ & & C \end{bmatrix} \]

This representation naturally extends to cases with multiple intermediate classifications.

The input for updateCorrespondenceTable() simply requires the classifications (\(A, A^*\) and \(B\)) and correspondence tables (\(A:B\) and \(A:A^*\)) as data frames.

Output

As output, both newCorrespondenceTable() and updateCorrespondenceTable() return a list containing:

Helper for the examples

When newCorrespondenceTable() is used with a CSV-based input structure, the CSV file that specifies the input layout must contain full file paths to the referenced CSV files, rather than file names alone. Accordingly, in the sample input, the file names appearing in the CSV table cells must be prefixed with their full path.

To streamline this task, the utility function fullPath, defined below, is used in all the following examples.


tmp_dir <- tempdir()

fullPath <- function(CSVraw, CSVappended){
  NamesCsv <- system.file("extdata/test", CSVraw, package = "correspondenceTables")
  A <- read.csv(NamesCsv, header = FALSE, sep = ",")
   for (i in 1:nrow(A)) {
    for (j in 1:ncol(A)) {
      if (A[i,j]!="") {
        A[i, j] <- system.file("extdata/test", A[i, j], package = "correspondenceTables")
      }}}
  write.table(x = A, file = file.path(tmp_dir,CSVappended), row.names = FALSE, col.names = FALSE, sep = ",")
  return(A)
}

Creating correspondence tables: general case using newCorrespondenceTable()

Example 1: ISIC Rev. 4 : CPA Ver. 2.1 (via CPC Ver. 2.1)

fullPath("names1.csv", "names.csv")

Execute the following code to apply function newCorrespondenceTable() and generate the correspondence table linking ISIC Rev. 4 (classification A) to CPA 2.1 (classification B) through the intermediate classification CPC 2.1. When no trimming is executed (Redundancy_trim = FALSE), redundant records are shown, together with the redundancy flag.

NCT <- newCorrespondenceTable(
        Tables = file.path(tmp_dir, "names.csv"),
        Reference = "A",
        MismatchTolerance = 0.5,
        Redundancy_trim = FALSE,
        Progress = FALSE
)

knitr::kable(
  (NCT[[1]][3748:3753, 1:9]),
  caption = "ISIC Rev. 4 to CPA Ver. 2.1 (via CPC Ver. 2.1): Subsample of the new Correspondence Table",
  align = "c"
)
ISIC Rev. 4 to CPA Ver. 2.1 (via CPC Ver. 2.1): Subsample of the new Correspondence Table
ISIC Rev. 4 CPC 2.1 CPA 2.1 Review Redundancy Redundancy_keep Unmatched NoMatchFromA NoMatchFromB
3748 1030 21495 10.39.23 0 0 0 0 0 0
3749 1030 21496 10.39.24 0 0 0 0 0 0
3750 1030 21429 10.39.25 0 1 1 0 0 0
3751 1030 21421 10.39.25 0 1 0 0 0 0
3752 1030 21424 10.39.25 0 1 0 0 0 0
3753 1030 21422 10.39.25 0 1 0 0 0 0

The table above represents a subset of the correspondence table generated in this example. Each row represents a candidate correspondence between an ISIC code and a CPA code, possibly mediated by one or more intermediate classifications.

Here, the ISIC code 1030 is linked to several CPA codes:

  • The rows linking 1030 to 10.39.23 and 10.39.24 are unique and unambiguous.
    These rows have Redundancy = 0, Unmatched = 0, and no review or mismatch flags set.

  • The CPA code 10.39.25 appears multiple times in combination with the same ISIC code 1030, via different CPC codes.
    These rows are therefore flagged with Redundancy = 1.

When Redundancy_trim = FALSE, all redundant rows are retained and an additional column, Redundancy_keep, is included:

  • Redundancy_keep = 1 identifies the record that would be kept if redundancy trimming were applied.
  • Rows with Redundancy_keep = 0 represent redundant alternatives.

All rows in this example have Unmatched = 0, indicating that each ISIC code is matched to at least one CPA code and vice versa.
Similarly, NoMatchFromA = 0 and NoMatchFromB = 0 show that no codes from the original classification tables are missing from the correspondence tables involved in the construction.

Finally, the Review flag is equal to 0 for all rows, indicating that given the selected reference classification, no hierarchical inconsistencies are detected.

knitr::kable(
  head(NCT[[2]]),
  caption = "ISIC Rev. 4 to CPA Ver. 2.1 (via CPC Ver. 2.1): Names of the classifications involved",
  align = "c"
)
ISIC Rev. 4 to CPA Ver. 2.1 (via CPC Ver. 2.1): Names of the classifications involved
Classification: Name
A: ISIC Rev. 4
C1: CPC 2.1
B: CPA 2.1

The table above is the second element generated with newCorrespondenceTable, which simply is a data frame containing the names of all classifications involved.

Example 2: NACE Rev. 2 : SITC 4 (via CPA Ver. 2.1 and CN 2022), many-to-many case.

fullPath("names4.csv", "names.csv")

Execute the following code to apply function newCorrespondenceTable() and generate the correspondence table linking NACE Rev. 2 (classification A) to SITC 4 (classification B) through the intermediate classifications CPA Ver. 2.1 and CN 2022. Given the option Redundancy_trim = TRUE, when there are redundant records, these are removed and kept exactly one record for each unique combination.

NCT <- newCorrespondenceTable(
        Tables = file.path(tmp_dir, "names.csv"),
        Reference = "none",
        MismatchTolerance = 0.96,
        Redundancy_trim = TRUE,
        Progress = FALSE
      )



knitr::kable(
  head(NCT[[1]][5442:5450, 1:8]),
  caption = "NACE Rev. 2 : SITC 4 (via CPA Ver. 2.1 and CN 2022): Subsample of the new Correspondence Table",
  align = "c"
)
NACE Rev. 2 : SITC 4 (via CPA Ver. 2.1 and CN 2022): Subsample of the new Correspondence Table
NACE Rev. 2 CPA 2.1 CN 2022 SITC4 Redundancy Unmatched NoMatchFromA NoMatchFromB
5442 28.41 28.41.24 84623210 73314 0 0 0 0
5443 28.41 28.41.24 84623290 73315 0 0 0 0
5444 28.41 28.41.32 84624200 73316 0 0 0 0
5445 28.41 28.41.32 84624900 73317 0 0 0 0
5446 28.41 28.41.33 Multiple 73318 1 0 0 0
5447 28.41 Multiple Multiple 73399 1 0 0 0

Also in this case, the table above represents a subset of the correspondence table generated in this example. Each row corresponds to a correspondence between a NACE code and a SITC code, possibly mediated by multiple intermediate classifications.

In this example, the NACE code 28.41 is mapped to several SITC codes:

  • The first four rows represent unique and unambiguous correspondences, where specific CPA and CN codes are associated with specific SITC codes.
    These rows have Redundancy = 0 and Unmatched = 0, indicating clear one-to-one mappings across all classifications involved.

  • The last two rows are flagged with Redundancy = 1.
    In these cases, multiple intermediate codes (in CPA and/or CN) contribute to the same NACE–SITC mapping. As a result, the corresponding intermediate classification values are reported as "Multiple".

All rows have Unmatched = 0, indicating that each correspondence links a valid NACE code to a valid SITC code.
Additionally, NoMatchFromA = 0 and NoMatchFromB = 0 for all rows confirm that no classification codes are missing from the correspondence tables used to construct the result.

knitr::kable(
  head(NCT[[2]]),
  caption = "NACE Rev. 2 : SITC 4 (via CPA Ver. 2.1 and CN 2022): Names of the classifications involved",
  align = "c"
)
NACE Rev. 2 : SITC 4 (via CPA Ver. 2.1 and CN 2022): Names of the classifications involved
Classification: Name
A: NACE Rev. 2
C1: CPA 2.1
C2: CN 2022
B: SITC4

The table above corresponds to the second element returned by newCorrespondenceTable and is a data frame containing the names of all the classifications involved in the process.

Updating correspondence tables using updateCorrespondenceTable()

Example 3: Updating CN 2021 : CPA Ver. 2.1 (triggered by CN update)

Execute the following code in order to get the path of the required input files.

A <- read.csv(
  system.file("extdata/test", "CN2021.csv", package = "correspondenceTables"),
  colClasses = "character"
)

AStar <- read.csv(
  system.file("extdata/test", "CN2022.csv", package = "correspondenceTables"),
  colClasses = "character"
)

B <- read.csv(
  system.file("extdata/test", "CPA21.csv", package = "correspondenceTables"),
  colClasses = "character"
)

AB <- read.csv(
  system.file("extdata/test", "CN2021_CPA21.csv", package = "correspondenceTables"),
  colClasses = "character"
)

AAStar <- read.csv(
  system.file("extdata/test", "CN2021_CN2022.csv", package = "correspondenceTables"),
  colClasses = "character"
)

Execute the following code line to apply function updateCorrespondenceTable() and generate the updated correspondence table. In this case the classification CN 2021 (A) has been updated to CN 2022 (A*), and the correspondence to CPA 2.1 (B) is revised accordingly. Given the option Redundancy_trim = TRUE, when there are redundant records, these are removed and kept exactly one record for each unique combination.

UPC <- updateCorrespondenceTable(
  A = A,
  B = B,
  AStar = AStar,
  AB = AB,
  AAStar = AAStar,
  Reference = "B",
  MismatchToleranceB = 0.4,
  MismatchToleranceAStar = 0.4,
  Redundancy_trim = TRUE
)


knitr::kable(
  (UPC[[1]][7950:7955, 1:11]),
  caption = "Updating CN 2021 : CPA Ver. 2.1  (triggered by CN update): Subsample of the new CorrespondenceTable",
  align = "c"
)
Updating CN 2021 : CPA Ver. 2.1 (triggered by CN update): Subsample of the new CorrespondenceTable
CN.2021 CN.2022 CPA.2.1 CodeChange Review Redundancy NoMatchToAStar NoMatchToB NoMatchFromAStar NoMatchFromB LabelChange
7950 84148080 84148080 28.13.28 1 0 0 0 0 0 0 1
7951 84149000 84149000 28.13.32 1 1 0 0 0 0 0 1
7952 84219990 84149000 28.29.82 1 1 0 0 0 0 0 1
7953 84151010 84151010 28.25.12 0 0 0 0 0 0 0 0
7954 84151090 84151090 28.25.12 0 0 0 0 0 0 0 0
7955 84152000 84152000 28.25.12 0 0 0 0 0 0 0 0

The table above represents a subset of the correspondence table generated in this example. Each row links a CN 2022 code to a CPA 2.1 code and reflects changes from the previous version.

In this example:

  • The first three rows are flagged with CodeChange = 1, indicating that the original CN 2021 codes are associated with updated CN 2022 codes in a way that differs from the previous mapping.
    These rows also have LabelChange = 1, meaning that the labels of the corresponding CN codes have changed between versions.

  • Rows where Review = 1 indicate potential hierarchical inconsistencies with respect to the selected reference classification, and therefore require manual inspection.

  • The remaining rows have CodeChange = 0 and LabelChange = 0, showing that both the code and its label remain unchanged between CN 2021 and CN 2022 for the given correspondence to CPA 2.1.

All rows have Redundancy = 0, meaning that each CN 2022–CPA 2.1 combination appears only once in the updated correspondence table.
Similarly, NoMatchToAStar = 0 and NoMatchToB = 0 indicate that each row contains valid codes for both CN 2022 and CPA 2.1.

Finally, the flags NoMatchFromAStar = 0 and NoMatchFromB = 0 for all rows confirm that every code appearing in the updated correspondence is consistently represented in both the updated classification table and the underlying concordance tables.

knitr::kable(
  head(UPC[[2]]),
  caption = "Updating CN 2021 : CPA Ver. 2.1  (triggered by CN update): Names of the classifications involved",
  align = "c",
  col.names = "Classification: Name"
)
Updating CN 2021 : CPA Ver. 2.1 (triggered by CN update): Names of the classifications involved
Classification: Name
A: CN.2021
B: CPA.2.1
AStar: CN.2022

The table above is the second element generated with updateCorrespondenceTable, which simply is a data frame containing the names of all classifications involved.

Example 4: Updating NAICS : NACE (triggered by NAICS update)

Execute the following code in order to get the path of the required input files.

A <- read.csv(
  system.file("extdata/test", "NAICS2017.csv", package = "correspondenceTables"),
  colClasses = "character"
)

AStar <- read.csv(
  system.file("extdata/test", "NAICS2022.csv", package = "correspondenceTables"),
  colClasses = "character"
)

B <- read.csv(
  system.file("extdata/test", "NACE.csv", package = "correspondenceTables"),
  colClasses = "character"
)

AB <- read.csv(
  system.file("extdata/test", "NAICS2017_NACE.csv", package = "correspondenceTables"),
  colClasses = "character"
)

AAStar <- read.csv(
  system.file("extdata/test", "NAICS2017_NAICS2022.csv", package = "correspondenceTables"),
  colClasses = "character"
)

Execute the following code line to apply function updateCorrespondenceTable() and generate the updated correspondence table. In this case the classification NAICS 2017 (A) has been updated to NAICS 2022 (A*), and the correspondence to NACE Rev. 2 (B) is revised accordingly. Given the option Redundancy_trim = TRUE, when there are redundant records, these are removed and kept exactly one record for each unique combination.


UPC3 <- updateCorrespondenceTable(
  A = A,
  B = B,
  AStar = AStar,
  AB = AB,
  AAStar = AAStar,
  Reference = "none",
  MismatchToleranceB = 0.5,
  MismatchToleranceAStar = 0.8,
  Redundancy_trim = TRUE
)


knitr::kable(
  head(UPC3[[1]][1208:1218, 1:10]),
  caption = "Updating NAICS : NACE (triggered by NAICS update): Subsample of the new Correspondence Table",
  align = "c"
)
Updating NAICS : NACE (triggered by NAICS update): Subsample of the new Correspondence Table
NAICS.2017 NAICS.2022 NACE.Rev..2 CodeChange Redundancy NoMatchToAStar NoMatchToB NoMatchFromAStar NoMatchFromB LabelChange
1208 332313 332313 25.11 0 0 0 0 0 0 0
1209 332313 332313 25.29 0 0 0 0 0 0 0
1210 332313 332313 25.30 0 0 0 0 0 0 0
1211 332313 332313 28.22 0 0 0 0 0 0 0
1212 332313 332313 28.91 0 0 0 0 0 0 0
1213 332313 332313 30.11 0 0 0 0 0 0 0

The table above represents a subset of the correspondence table generated in this example. Each row represents a candidate correspondence between a NAICS 2022 code and a NACE Rev. 2 code, derived from the previous version of the classification (NAICS 2017).

In this example:

  • The NAICS code 332313 is unchanged between NAICS 2017 and NAICS 2022, as indicated by CodeChange = 0 for all rows. This shows that the classification update did not introduce any code-level changes for this activity.

  • The same NAICS code 332313 is mapped to multiple NACE Rev. 2 codes (25.11, 25.29, 25.30, 28.22, 28.91, 30.11), reflecting a one-to-many correspondence that already existed and remains valid after the update.

  • All rows have Redundancy = 0, meaning that each NAICS 2022–NACE Rev. 2 combination appears only once in the updated correspondence table.

  • The flags NoMatchToAStar = 0 and NoMatchToB = 0 indicate that every row contains valid and consistent codes for both the updated classification (NAICS 2022) and the target classification (NACE Rev. 2).

  • Similarly, NoMatchFromAStar = 0 and NoMatchFromB = 0 confirm that all codes appearing in the updated correspondence are present in the respective classification tables and supported by the underlying concordance tables.

  • Finally, LabelChange = 0 for all rows shows that the labels associated with the NAICS codes are identical between the 2017 and 2022 versions.


  knitr::kable(
  head(UPC3[[2]]),
  caption = "Updating NAICS : NACE (triggered by NAICS update): Names of the classifications involved",
  align = "c",
  col.names = "Classification: Name"
)
Updating NAICS : NACE (triggered by NAICS update): Names of the classifications involved
Classification: Name
A: NAICS.2017
B: NACE.Rev..2
AStar: NAICS.2022

The table above corresponds to the second element returned by updateCorrespondenceTable and is a data frame containing the names of all relevant classifications.