
Maximum information preservable by supervised nominal lumping
Source:R/supervised.R
maximum_mutual_information_nominal_supervised.RdCalculates the maximum amount of mutual information between a lumped nominal covariate and a discrete outcome that can be preserved by lumping.
Usage
maximum_mutual_information_nominal_supervised(
joint_counts,
threshold,
adj_matrix = NULL,
verbose = FALSE
)Arguments
- joint_counts
Named numeric matrix with one row per level and one column per outcome category. Row names must identify the levels. Entry (k, y) is the number of observations with covariate level k and outcome y.
- threshold
Minimum number of samples each lumped level must contain.
- adj_matrix
Adjacency matrix of the preference graph. Default: a complete graph, allowing all lumpings.
- verbose
Whether to print diagnostic messages. Default:
FALSE.
Value
A list containing information about the optimal lumping:
- mutual_information
Mutual information between the lumped covariate and the outcome, in nats.
- loss
Mutual information lost by the lumping.
- lumping
A list of character vectors, where each vector contains the names of the original levels that have been lumped together.
Details
Be advised that, since the problem is NP-hard, the implementation here has time complexity \(O\left(2^{2^m}\right)\), where \(m\) is the number of levels in the nominal variable.
See also
maximum_mutual_information_nominal() for the unsupervised version.
maximum_mutual_information_hierarchical_supervised() for a version that
exploits hierarchical structure to speed up execution.
maximum_mutual_information_nominal_supervised_continuous() for a version that accepts a continuous outcome.
Examples
joint_counts <- matrix(
c(8, 2, 1, 4, 1, 2),
nrow = 3,
dimnames = list(c("A", "B", "C"), c("y0", "y1"))
)
maximum_mutual_information_nominal_supervised(joint_counts, threshold = 3)
#> $mutual_information
#> [1] 0.03173431
#>
#> $loss
#> [1] -2.220446e-16
#>
#> $lumping
#> $lumping[[1]]
#> [1] "C"
#>
#> $lumping[[2]]
#> [1] "B"
#>
#> $lumping[[3]]
#> [1] "A"
#>
#>