| Title: | Visual Diagnostics for Multiple Imputation |
|---|---|
| Description: | A comprehensive suite of static and interactive visual diagnostics for assessing the quality of multiply-imputed data obtained from packages such as 'mixgb' and 'mice'. The package supports inspection of distributional characteristics, diagnostics based on masking observed values and comparing them with re-imputed values, and convergence diagnostics. |
| Authors: | Yongshi Deng [aut, cre] (ORCID: <https://orcid.org/0000-0001-5845-859X>), Thomas Lumley [ths] |
| Maintainer: | Yongshi Deng <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.9.5 |
| Built: | 2026-06-04 08:56:17 UTC |
| Source: | https://github.com/agnesdeng/vismi |
A small precomputed list object containing 5 imputed datasets generated by 'mixgb::mixgb()' on the 'newborn' example data. This dataset is included so that users can run plotting examples without installing 'mixgb'.
data(imp_newborn)data(imp_newborn)
A list of 5 data.frames (each a completed dataset) created by 'mixgb::mixgb()' in development.
Generated during package development with 'mixgb::mixgb()'.
A small precomputed list object containing 5 imputed datasets generated by 'mixgb::mixgb()' on the 'nhanes3' example data. This dataset is included so that users can run plotting examples without installing 'mixgb'.
data(imp_nhanes3)data(imp_nhanes3)
A list of 5 data.frames (each a completed dataset) created by 'mixgb::mixgb()' in development.
Generated during package development with 'mixgb::mixgb()'.
This dataset is extracted from the NHANES III (1988-1994) for the age class Newborn (under 1 year). Please note that this example dataset only contains selected variables and is for demonstration purposes only.
data(newborn)data(newborn)
A data frame of 2107 rows and 16 variables, adapted from the NHANES III dataset. Nine variables contain missing values. Variable names and factor levels have been renamed for clarity and easier interpretation.
Household size. An integer variable ranging from 1 to 10. The original variable name in the NHANES III dataset is HSHSIZER.
Age at interview (screener), in months. An integer variable ranging from 2 to 11. The original variable name in the NHANES III dataset is HSAGEIR.
Sex of the subject. A factor variable with levels Male and Female. The original variable name in the NHANES III dataset is HSSEX.
Race of the subject. A factor variable with levels White, Black, and Other. The original variable name in the NHANES III dataset is DMARACER.
Ethnicity of the subject. A factor variable with levels Mexican-American, Other Hispanic, and Not Hispanic. The original variable name in the NHANES III dataset is DMAETHNR.
Combined race–ethnicity classification. A factor variable with levels Non-Hispanic White, Non-Hispanic Black, Mexican-American, and Other. The original variable name in the NHANES III dataset is DMARETHN.
Head circumference, in centimetres. Numeric. The original variable name in the NHANES III dataset is BMPHEAD.
Recumbent length, in centimetres. Numeric. The original variable name in the NHANES III dataset is BMPRECUM.
First subscapular skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is BMPSB1.
Second subscapular skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is BMPSB2.
First triceps skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is BMPTR1.
Second triceps skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is BMPTR2.
Body weight, in kilograms. Numeric. The original variable name in the NHANES III dataset is BMPWT.
Poverty income ratio. Numeric. The original variable name in the NHANES III dataset is DMPPIR.
Whether anyone living in the household smokes cigarettes inside the home. A factor variable with levels Yes and No. The original variable name in the NHANES III dataset is HFF1.
General health status of the subject. An ordered factor with levels Excellent, Very Good, Good, Fair, and Poor. The original variable name in the NHANES III dataset is HYD1.
https://wwwn.cdc.gov/nchs/nhanes/nhanes3/datafiles.aspx
U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics. Third National Health and Nutrition Examination Survey (NHANES III, 1988-1994): Multiply Imputed Data Set. CD-ROM, Series 11, No. 7A. Hyattsville, MD: Centers for Disease Control and Prevention, 2001. Includes access software: Adobe Systems, Inc. Acrobat Reader version 4.
This dataset is a small subset of newborn. It is for demonstration purposes only. More information on NHANES III data can be found on https://wwwn.cdc.gov/Nchs/Data/Nhanes3/7a/doc/mimodels.pdf
data(nhanes3)data(nhanes3)
A data frame of 500 rows and 6 variables. Three variables have missing values.
Age at interview (screener), in months. An integer variable ranging from 2 to 11. The original variable name in the NHANES III dataset is HSAGEIR.
Sex of the subject. A factor variable with levels Male and Female. The original variable name in the NHANES III dataset is HSSEX.
Ethnicity of the subject. A factor variable with levels Mexican-American, Other Hispanic, and Not Hispanic. The original variable name in the NHANES III dataset is DMAETHNR.
Head circumference, in centimetres. Numeric. The original variable name in the NHANES III dataset is BMPHEAD.
Recumbent length, in centimetres. Numeric. The original variable name in the NHANES III dataset is BMPRECUM.
Body weight, in kilograms. Numeric. The original variable name in the NHANES III dataset is BMPWT.
https://wwwn.cdc.gov/nchs/nhanes/nhanes3/datafiles.aspx
U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics. Third National Health and Nutrition Examination Survey (NHANES III, 1988-1994): Multiply Imputed Data Set. CD-ROM, Series 11, No. 7A. Hyattsville, MD: Centers for Disease Control and Prevention, 2001. Includes access software: Adobe Systems, Inc. Acrobat Reader version 4.
Overimp main function to call different imputation methods.
overimp( data, m = 5, p = 0.2, test_ratio = 0, method = "mixgb", seed = NULL, ... )overimp( data, m = 5, p = 0.2, test_ratio = 0, method = "mixgb", seed = NULL, ... )
data |
A data frame with missing values. |
m |
The number of imputation. |
p |
The extra proportion of missing values. |
test_ratio |
The proportion of test set. Default is 0, meaning no test set. |
method |
Can be one of the following: "mixgb","mice", and more in the future. |
seed |
Random seed. |
... |
Other arguments to be passed into the overimp function. |
An overimp object containing imputed training, test data (if applicable) and essential parameters required for plotting.
obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0.2, method = "mixgb")obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0.2, method = "mixgb")
vismi Print method for vismi objects
## S3 method for class 'vismi' print(x, ...)## S3 method for class 'vismi' print(x, ...)
x |
An object of class 'vismi' created by the |
... |
Additional arguments (not used). |
A vismi object, returned invisibly.
Generates a Trelliscope display for distributional characteristics across all variables.
trellis_vismi( data, imp_list, m = NULL, imp_idx = NULL, integerAsFactor = FALSE, title = "auto", subtitle = "auto", color_pal = NULL, marginal_x = "box+rug", nrow = 2, ncol = 4, path = NULL, verbose = FALSE, ... )trellis_vismi( data, imp_list, m = NULL, imp_idx = NULL, integerAsFactor = FALSE, title = "auto", subtitle = "auto", color_pal = NULL, marginal_x = "box+rug", nrow = 2, ncol = 4, path = NULL, verbose = FALSE, ... )
data |
A data frame containing the original data with missing values. |
imp_list |
A list of imputed data frames. |
m |
An integer specifying the number of imputed datasets to plot. It should be smaller than |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. Default is NULL (plot all). |
integerAsFactor |
A logical value indicating whether to treat integer variables as factors (TRUE) or numeric (FALSE). Default is FALSE. |
title |
A string specifying the title of the plot. Default is "auto" (automatic title based on |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle based on |
color_pal |
A named vector of colors for different imputation sets. If NULL (default), a default color palette is used. |
marginal_x |
A character string specifying the type of marginal plot to add for the x variable in 2D plots. Options are "hist", "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = TRUE. Options are "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = FALSE. |
nrow |
Number of rows in the Trelliscope display. Default is 2. |
ncol |
Number of columns in the Trelliscope display. Default is 4. |
path |
Optional path to save the Trelliscope display. If NULL, the display will not be saved to disk. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments passed to the underlying plotting functions, such as point_size, alpha, nbins, width, and boxpoints. |
A Trelliscope display object visualising distributional characteristics for all variables.
trellis_vismi(data = nhanes3, imp_list = imp_nhanes3, marginal_x = "box")trellis_vismi(data = nhanes3, imp_list = imp_nhanes3, marginal_x = "box")
Generates a Trelliscope display for convergence diagnostics across all variables.
trellis_vismi_converge( obj, tick_vals = NULL, color_pal = NULL, title = "auto", subtitle = "auto", nrow = 2, ncol = 4, path = NULL, verbose = FALSE, ... )trellis_vismi_converge( obj, tick_vals = NULL, color_pal = NULL, title = "auto", subtitle = "auto", nrow = 2, ncol = 4, path = NULL, verbose = FALSE, ... )
obj |
An object of class 'mixgb' or 'mids' containing intermediate imputed result for each iteration. |
tick_vals |
A numeric vector specifying the tick values for the x-axis (iterations). If NULL, default tick values will be used. |
color_pal |
A vector of colors to use for the imputation lines. If NULL, default colors will be used. |
title |
A string specifying the title of the plot. If NULL, no title is shown. If "auto", a title will be generated based on the input. Default is "auto". |
subtitle |
A string specifying the subtitle of the plot. If NULL, no subtitle is shown. If "auto", a title will be generated based on the input. Default is "auto". |
nrow |
Number of rows in the Trelliscope display. Default is 2. |
ncol |
Number of columns in the Trelliscope display. Default is 4. |
path |
Optional path to save the Trelliscope display. If NULL, the display will not be saved to disk. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments to customize the Trelliscope display. |
A Trelliscope display object visualising convergence diagnostics for all variables.
library(mixgb) set.seed(2026) mixgb_obj <- mixgb(data = nhanes3, m = 3, maxit = 4, pmm.type = "auto", save.models = TRUE) trellis_vismi_converge(obj = mixgb_obj)library(mixgb) set.seed(2026) mixgb_obj <- mixgb(data = nhanes3, m = 3, maxit = 4, pmm.type = "auto", save.models = TRUE) trellis_vismi_converge(obj = mixgb_obj)
Generates a Trelliscope display for overimputation diagnostics across all variables.
trellis_vismi_overimp( obj, m = NULL, imp_idx = NULL, integerAsFactor = FALSE, title = "auto", subtitle = "auto", num_plot = "cv", fac_plot = "cv", train_color_pal = NULL, test_color_pal = NULL, stack_y = FALSE, diag_color = "white", seed = 2025, nrow = 2, ncol = 4, path = NULL, verbose = FALSE, ... )trellis_vismi_overimp( obj, m = NULL, imp_idx = NULL, integerAsFactor = FALSE, title = "auto", subtitle = "auto", num_plot = "cv", fac_plot = "cv", train_color_pal = NULL, test_color_pal = NULL, stack_y = FALSE, diag_color = "white", seed = 2025, nrow = 2, ncol = 4, path = NULL, verbose = FALSE, ... )
obj |
An object of class 'overimp' containing imputed datasets and parameters. |
m |
A single positive integer specifying the number of imputed datasets to plot. It should be smaller than the total number of imputed datasets in the object. Default is NULL ( plot all). |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. Default is NULL (plot all). |
integerAsFactor |
A logical indicating whether integer variables should be treated as factors. Default is FALSE (treated as numeric). |
title |
A string specifying the title of the plot. Default is "auto" (automatic title). If NULL, no title is shown. |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle). If NULL, no subtitle is shown. |
num_plot |
A character string specifying the type of plot for numeric variables. Options are "cv" (cross-validation), "ridge", or "density". Default is "cv". |
fac_plot |
A character string specifying the type of plot for categorical variables. Options are "cv" (cross-validation), "bar", or "dodge". Default is "cv". |
train_color_pal |
A vector of colors for the training data. If NULL, default colors will be used. |
test_color_pal |
A vector of colors for the test data. If NULL, default colors will be used. |
stack_y |
A logical indicating whether to stack y-values in the plots. Default is FALSE. |
diag_color |
A color specification for the diagonal line in the plots. Default is NULL. |
seed |
An integer seed for reproducibility. Default is 2025. |
nrow |
Number of rows in the Trelliscope display. Default is 2. |
ncol |
Number of columns in the Trelliscope display. Default is 4. |
path |
Optional path to save the Trelliscope display. If NULL, the display will not be saved to disk. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments to customize the plots, such as point_size, xlim, ylim. |
A Trelliscope display object visualising overimputation diagnostics for all variables.
obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0, method = "mixgb") trellis_vismi_overimp(obj = obj, stack_y = TRUE)obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0, method = "mixgb") trellis_vismi_overimp(obj = obj, stack_y = TRUE)
This function provides visual diagnostic tools for assessing multiply imputed datasets created with 'mixgb' or other imputers through inspecting the distributional characteristics of imputed variables. It supports 1D, 2D, and 3D visualisations for numeric and categorical variables using either interactive or static plots.
vismi( data, imp_list, x = NULL, y = NULL, z = NULL, m = NULL, imp_idx = NULL, interactive = FALSE, integerAsFactor = FALSE, title = "auto", subtitle = "auto", color_pal = NULL, marginal_x = "box+rug", marginal_y = NULL, verbose = FALSE, ... )vismi( data, imp_list, x = NULL, y = NULL, z = NULL, m = NULL, imp_idx = NULL, interactive = FALSE, integerAsFactor = FALSE, title = "auto", subtitle = "auto", color_pal = NULL, marginal_x = "box+rug", marginal_y = NULL, verbose = FALSE, ... )
data |
A data frame containing the original data with missing values. |
imp_list |
A list of imputed data frames. |
x |
A character string specifying the name of the variable to plot on the x axis. Default is NULL. |
y |
A character string specifying the name of the variable to plot on the y axis. Default is NULL. |
z |
A character string specifying the name of the variable to plot on the z axis. Default is NULL. |
m |
An integer specifying the number of imputed datasets used for visualisation. It should be smaller than |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. Default is NULL (plot all). |
interactive |
A logical value indicating whether to create an interactive plotly plot (TRUE by default) or a static ggplot2 plot (FALSE). |
integerAsFactor |
A logical value indicating whether to treat integer variables as factors (TRUE) or numeric (FALSE). Default is FALSE. |
title |
A string specifying the title of the plot. Default is "auto" (automatic title based on |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle based on |
color_pal |
A named vector of colors for different imputation sets. If NULL (default), a default color palette is used. |
marginal_x |
A character string specifying the type of marginal plot to add for the x variable in 2D plots. Options are "hist", "box", "rug", "box+rug"(default), or NULL when interactive = TRUE. Options are "box", "rug", "box+rug"(default), or NULL when interactive = FALSE. |
marginal_y |
A character string specifying the type of marginal plot to add for the y variable in 2D plots. Options are "hist", "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = TRUE. Options are "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = FALSE. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments passed to the underlying plotting functions, such as point_size, alpha, nbins, width, and boxpoints. |
A plotly or ggplot2 object visualising the multiply-imputed data.
vismi(data = nhanes3, imp_list = imp_nhanes3, x = "weight_kg", y = "head_circumference_cm", z="sex")vismi(data = nhanes3, imp_list = imp_nhanes3, x = "weight_kg", y = "head_circumference_cm", z="sex")
This function generates convergence diagnostic plots showing the mean and standard deviation (SD) of imputed values for a specified variable across iterations.
vismi_converge( obj, x, xlim = NULL, mean_lim = NULL, sd_lim = NULL, title = "auto", subtitle = "auto", tick_vals = NULL, color_pal = NULL, linewidth = 0.8, ... )vismi_converge( obj, x, xlim = NULL, mean_lim = NULL, sd_lim = NULL, title = "auto", subtitle = "auto", tick_vals = NULL, color_pal = NULL, linewidth = 0.8, ... )
obj |
A 'mixgb' object returned by |
x |
The name of the variable to plot convergence for. |
xlim |
Optional numeric vector of length 2 specifying the x-axis limits for iterations. |
mean_lim |
Optional numeric vector of length 2 specifying the y-axis limits for mean values of the variable. |
sd_lim |
Optional numeric vector of length 2 specifying the y-axis limits for standard deviation values of the variable. |
title |
A string specifying the title of the plot. If NULL, no title is shown. If "auto", a title will be generated based on the input. Default is "auto". |
subtitle |
A string specifying the subtitle of the plot. If NULL, no subtitle is shown. If "auto", a title will be generated based on the input. Default is "auto". |
tick_vals |
Optional numeric vector specifying x-axis tick values for iterations. |
color_pal |
A vector of m color codes (e.g., hex codes). If NULL, default colors will be used. |
linewidth |
The line width for the plot lines. Default is 0.8. |
... |
Additional arguments. |
Two side-by-side ggplot2 object showing the mean and standard deviation (SD) of imputed values for a specified variable across iterations.
library(mixgb) set.seed(2026) mixgb_obj <- mixgb(data = nhanes3, m = 3, maxit = 4, pmm.type = "auto", save.models = TRUE) vismi_converge(obj = mixgb_obj, x = "recumbent_length_cm")library(mixgb) set.seed(2026) mixgb_obj <- mixgb(data = nhanes3, m = 3, maxit = 4, pmm.type = "auto", save.models = TRUE) vismi_converge(obj = mixgb_obj, x = "recumbent_length_cm")
This function provides overimputation diagnostics for assessing imputations generated by 'mice', 'mixgb' or other imputers. It supports evaluation on both training and test data.
vismi_overimp( obj, x = NULL, y = NULL, z = NULL, m = NULL, imp_idx = NULL, integerAsFactor = FALSE, title = "auto", subtitle = "auto", num_plot = "cv", fac_plot = "cv", train_color_pal = NULL, test_color_pal = NULL, stack_y = FALSE, diag_color = NULL, seed = 2025, ... )vismi_overimp( obj, x = NULL, y = NULL, z = NULL, m = NULL, imp_idx = NULL, integerAsFactor = FALSE, title = "auto", subtitle = "auto", num_plot = "cv", fac_plot = "cv", train_color_pal = NULL, test_color_pal = NULL, stack_y = FALSE, diag_color = NULL, seed = 2025, ... )
obj |
Overimputation object of class 'overimp' created by the |
x |
A character string specifying the name of the variable to plot on the x axis. Default is NULL. |
y |
A character string specifying the name of the variable to plot on the y axis. Default is NULL. |
z |
A character string specifying the name of the variable to plot on the z axis. Default is NULL. |
m |
A single positive integer specifying the number of imputed datasets to plot. It should be smaller than the total number of imputed datasets in the object. |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. |
integerAsFactor |
A logical indicating whether integer variables should be treated as factors. Default is FALSE (treated as numeric). |
title |
A string specifying the title of the plot. Default is "auto" (automatic title based on |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle based on |
num_plot |
A character string specifying the type of plot for numeric variables. |
fac_plot |
A character string specifying the type of plot for categorical variables. |
train_color_pal |
A vector of colors for the training data. If NULL, default colors will be used. |
test_color_pal |
A vector of colors for the test data. If NULL, default colors will be used. |
stack_y |
A logical indicating whether to stack y values in certain plots. Default is FALSE. |
diag_color |
A character string specifying the color of the diagonal line in scatter plots. Default is NULL. |
seed |
An integer specifying the random seed for reproducibility. Default is 2025. |
... |
Additional arguments to customize the plots, such as position, point_size, linewidth, alpha, xlim, ylim, boxpoints, width. |
An overimp_plot object displaying the overimputation plots for training and test data (if users set test_ratio > 0 in the overimp() function.)
obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0.2, method = "mixgb") vismi_overimp(obj = obj, x = "head_circumference_cm", num_plot = "cv")obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0.2, method = "mixgb") vismi_overimp(obj = obj, x = "head_circumference_cm", num_plot = "cv")