| Title: | Network Scale-Up Models for Aggregated Relational Data |
|---|---|
| Description: | Provides a variety of Network Scale-up Models for researchers to analyze Aggregated Relational Data, through the use of Stan and 'glmmTMB'. Also provides tools for model checking In this version, the package implements models from Laga, I., Bao, L., and Niu, X (2023) <doi:10.1080/01621459.2023.2165929>, Zheng, T., Salganik, M. J., and Gelman, A. (2006) <doi:10.1198/016214505000001168>, Killworth, P. D., Johnsen, E. C., McCarty, C., Shelley, G. A., and Bernard, H. R. (1998) <doi:10.1016/S0378-8733(96)00305-X>, and Killworth, P. D., McCarty, C., Bernard, H. R., Shelley, G. A., and Johnsen, E. C. (1998) <doi:10.1177/0193841X9802200205>. |
| Authors: | Ian Laga [aut, cre] |
| Maintainer: | Ian Laga <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.2-2 |
| Built: | 2026-06-24 15:34:16 UTC |
| Source: | https://github.com/ilaga/networkscaleup |
Compute Pearson Residuals for ARD matrix and fitted model
construct_pearson(ard, model_fit)construct_pearson(ard, model_fit)
ard |
ARD matrix y |
model_fit |
estimated model |
a vector (column by column) of corresponding residuals from ARD matrix
Compute Randomized Quantile Residuals for ARD Models
construct_rqr(ard, model_fit)construct_rqr(ard, model_fit)
ard |
ard matrix |
model_fit |
fitted model, along with required details |
a vector of residuals (column by column)
Plots of the estimated covariance structure from a given fitted model
cov_plots( ard, model_fit, x_cov, resid_type = c("rqr", "pearson_residuals"), method = "lm", se = F )cov_plots( ard, model_fit, x_cov, resid_type = c("rqr", "pearson_residuals"), method = "lm", se = F )
ard |
ard matrix |
model_fit |
a fitted object from [fit_mle()] or [fit_map()] |
x_cov |
covariate matrix |
resid_type |
the type of residuals to use |
method |
the method to use |
se |
whether to compute standard errors of estimates |
a list of ggplots, corresponding to covariance structure
Dispersion Metric for Fitted ARD Model
dispersion_metric(ard, model_fit)dispersion_metric(ard, model_fit)
ard |
ard matrix |
model_fit |
list of fitted model and details |
a ggplot of the hanging rootogram
A simulated data set to demonstrate and test the NSUM methods. The data was simulated from the basic Killworth Binomial model.
example_dataexample_data
A named list for an ARD survey from 100 respondents about 5 subpopulations.
A '100 x 5' matrix with integer valued respondents
A '100 x 5' matrix with simulated answers from a 1-5 Likert scale
A '100 x 4' matrix with answers for each respondents about 4 demographic questions
An integer specifying the total population size
A vector with the 5 true subpopulation sizes
A vector with the 100 true respondent degrees
Fit basic Poisson and Negative Binomial models using glmmTMB
fit_mle( ard, x_cov_global = NULL, x_cov_local = NULL, family = c("poisson", "nbinomial") )fit_mle( ard, x_cov_global = NULL, x_cov_local = NULL, family = c("poisson", "nbinomial") )
ard |
n_i by n_k ARD matrix |
x_cov_global |
n_i by p_global covariate matrix of global covariates |
x_cov_local |
n_i by p_local covariate matrix of local covariates |
family |
distribution to fit, either "poisson" or "nbinomial" |
list containing fitted model and extracted parameters
Compute Surrogate Residuals for ARD Models
get_surrogate(ard, model_fit = NULL)get_surrogate(ard, model_fit = NULL)
ard |
the ARD matrix |
model_fit |
list containing fitted model, details |
a vector of residuals (column by column)
Hanging Rootogram for Fitted ARD Model
hang_rootogram_ard(ard, model_fit, width = 0.9, x_max = NULL, by_group = FALSE)hang_rootogram_ard(ard, model_fit, width = 0.9, x_max = NULL, by_group = FALSE)
ard |
ard matrix |
model_fit |
fitted model object |
width |
width of bars |
x_max |
the maximum x value to display |
by_group |
logical; if TRUE, create separate rootograms for each column (group) |
a ggplot of the hanging rootogram (single plot if by_group=FALSE, combined plot if by_group=TRUE)
Fit Killworth models to ARD. This function estimates the degrees and population sizes using the plug-in MLE and MLE estimator.
killworth( ard, known_sizes = NULL, known_ind = 1:length(known_sizes), N = NULL, model = c("MLE", "PIMLE") )killworth( ard, known_sizes = NULL, known_ind = 1:length(known_sizes), N = NULL, model = c("MLE", "PIMLE") )
ard |
The 'n_i x n_k' matrix of non-negative ARD integer responses, where the '(i,k)th' element corresponds to the number of people that respondent 'i' knows in subpopulation 'k'. |
known_sizes |
The known subpopulation sizes corresponding to a subset of
the columns of |
known_ind |
The indices that correspond to the columns of |
N |
The known total population size. |
model |
A character string corresponding to either the plug-in MLE (PIMLE) or the MLE (MLE). The function assumes MLE by default. |
A named list with the estimated degrees and sizes.
Killworth, P. D., Johnsen, E. C., McCarty, C., Shelley, G. A., and Bernard, H. R. (1998). A Social Network Approach to Estimating Seroprevalence in the United States, Social Networks, 20, 23–50
Killworth, P. D., McCarty, C., Bernard, H. R., Shelley, G. A., and Johnsen, E. C. (1998). Estimation of Seroprevalence, Rape and Homelessness in the United States Using a Social Network Approach, Evaluation Review, 22, 289–308
Laga, I., Bao, L., and Niu, X. (2021). Thirty Years of the Network Scale-up Method, Journal of the American Statistical Association, 116:535, 1548–1559
# Analyze an example ard data set using the killworth function data(example_data) ard <- example_data$ard subpop_sizes <- example_data$subpop_sizes N <- example_data$N mle.est <- killworth(ard, known_sizes = subpop_sizes[c(1, 2, 4)], known_ind = c(1, 2, 4), N = N, model = "MLE" ) pimle.est <- killworth(ard, known_sizes = subpop_sizes[c(1, 2, 4)], known_ind = c(1, 2, 4), N = N, model = "PIMLE" ) ## Compare estimates with the truth plot(mle.est$degrees, example_data$degrees) data.frame( true = subpop_sizes[c(3, 5)], mle = mle.est$sizes, pimle = pimle.est$sizes )# Analyze an example ard data set using the killworth function data(example_data) ard <- example_data$ard subpop_sizes <- example_data$subpop_sizes N <- example_data$N mle.est <- killworth(ard, known_sizes = subpop_sizes[c(1, 2, 4)], known_ind = c(1, 2, 4), N = N, model = "MLE" ) pimle.est <- killworth(ard, known_sizes = subpop_sizes[c(1, 2, 4)], known_ind = c(1, 2, 4), N = N, model = "PIMLE" ) ## Compare estimates with the truth plot(mle.est$degrees, example_data$degrees) data.frame( true = subpop_sizes[c(3, 5)], mle = mle.est$sizes, pimle = pimle.est$sizes )
log computed uniform quantile
log_mix_uniform(logFl, logFu)log_mix_uniform(logFl, logFu)
logFl |
log of lower value |
logFu |
log of upper value |
log value of uniform between Flower and Fupper
Generate simulated ARD
make_ard( n_i = 500, n_k = 20, N = 1e+06, p = 0, p_global_nonzero = 0, p_local_nonzero = 0, group_corr = FALSE, degree_corr = FALSE, family = c("poisson", "nbinomial"), omega_range = c(1, 5), alpha_mean = 5, alpha_sd = 0.15, eta = 3, seed = NULL )make_ard( n_i = 500, n_k = 20, N = 1e+06, p = 0, p_global_nonzero = 0, p_local_nonzero = 0, group_corr = FALSE, degree_corr = FALSE, family = c("poisson", "nbinomial"), omega_range = c(1, 5), alpha_mean = 5, alpha_sd = 0.15, eta = 3, seed = NULL )
n_i |
number of respondents (rows) |
n_k |
number of groups (columns) |
N |
total population size |
p |
number of collected covariates |
p_global_nonzero |
number of non-zero global covariates |
p_local_nonzero |
number of non-zero local covariates |
group_corr |
group correlation |
degree_corr |
degree correlation |
family |
sampling distribution |
omega_range |
minimum and maximum omega for negative binomial overdispersion |
alpha_mean |
mean of alphas |
alpha_sd |
variance of alphas |
eta |
correlation hyperparameter for LKJ prior |
seed |
random seed |
simulated ARD along with all true parameters. Parameters which are not used in a specific setting are set to NULL.
make_ard(N = 10000, family = "poisson")make_ard(N = 10000, family = "poisson")
Construct tibble from ARD matrix
make_ard_tidy(ard)make_ard_tidy(ard)
ard |
the ARD matrix |
a tibble of ARD, with columns for row/col index
Provides a variety of Network Scale-up Models for researchers to analyze Aggregated Relational Data, mostly through the use of Stan.
Maintainer: Ian Laga [email protected] (ORCID)
Authors:
Owen G. Ward [email protected]
Anna L. Smith
Benjamin Vogel
Jieyun Wang
Le Bao [email protected]
Xiaoyue Niu [email protected]
Useful links:
This function fits the ARD using the Overdispersed model using the original Gibbs-Metropolis Algorithm provided in Zheng, Salganik, and Gelman (2006). The population size estimates and degrees are scaled using a post-hoc procedure. For the Stan implementation, see overdispersedStan.
overdispersed( ard, known_sizes = NULL, known_ind = NULL, G1_ind = NULL, G2_ind = NULL, B2_ind = NULL, N = NULL, warmup = 1000, iter = 1500, refresh = NULL, thin = 1, verbose = FALSE, alpha_tune = 0.4, beta_tune = 0.2, omega_tune = 0.2, init = "MLE" )overdispersed( ard, known_sizes = NULL, known_ind = NULL, G1_ind = NULL, G2_ind = NULL, B2_ind = NULL, N = NULL, warmup = 1000, iter = 1500, refresh = NULL, thin = 1, verbose = FALSE, alpha_tune = 0.4, beta_tune = 0.2, omega_tune = 0.2, init = "MLE" )
ard |
The 'n_i x n_k' matrix of non-negative ARD integer responses, where the '(i,k)th' element corresponds to the number of people that respondent 'i' knows in subpopulation 'k'. |
known_sizes |
The known subpopulation sizes corresponding to a subset of
the columns of |
known_ind |
The indices that correspond to the columns of |
G1_ind |
A vector of indices denoting the columns of 'ard' that correspond to the primary scaling groups, i.e. the collection of rare girls' names in Zheng, Salganik, and Gelman (2006). By default, all known_sizes are used. If G2_ind and B2_ind are not provided, 'C = C_1', so only G1_ind are used. If G1_ind is not provided, no scaling is performed. |
G2_ind |
A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the first secondary scaling groups, i.e. the collection of somewhat popular girls' names. |
B2_ind |
A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the second secondary scaling groups, i.e. the collection of somewhat popular boys' names. |
N |
The known total population size. |
warmup |
A positive integer specifying the number of warmup samples. |
iter |
A positive integer specifying the total number of samples (including warmup). |
refresh |
An integer specifying how often the progress of the sampling
should be reported. By default, resorts to every 10
|
thin |
A positive integer specifying the interval for saving posterior samples. Default value is 1 (i.e. no thinning). |
verbose |
Logical value, specifying whether sampling progress should be reported. |
alpha_tune |
A positive numeric indicating the standard deviation used as the jumping scale in the Metropolis step for alpha. Defaults to 0.4, which has worked well for other ARD datasets. |
beta_tune |
A positive numeric indicating the standard deviation used as the jumping scale in the Metropolis step for beta Defaults to 0.2, which has worked well for other ARD datasets. |
omega_tune |
A positive numeric indicating the standard deviation used as the jumping scale in the Metropolis step for omega Defaults to 0.2, which has worked well for other ARD datasets. |
init |
A named list with names corresponding to the first-level model
parameters, name 'alpha', 'beta', and 'omega'. By default the 'alpha' and
'beta' parameters are initialized at the values corresponding to the
Killworth MLE estimates (for the missing 'beta'), with all 'omega' set to
20. Alternatively, |
This function fits the overdispersed NSUM model using the Metropolis-Gibbs sampler provided in Zheng et al. (2006).
A named list with the estimated posterior samples. The estimated parameters are named as follows, with additional descriptions as needed:
Log degree, if scaled, else raw alpha parameters
Log prevalence, if scaled, else raw beta parameters
Inverse of overdispersion parameters
Standard deviation of alphas
Mean of betas
Standard deviation of betas
Overdispersion parameters
If scaled, the following additional parameters are included:
Mean of log degrees
Degree estimates
Subpopulation size estimates
Zheng, T., Salganik, M. J., and Gelman, A. (2006). How many people do you know in prison, Journal of the American Statistical Association, 101:474, 409–423
# Analyze an example ard data set using Zheng et al. (2006) models # Note that in practice, both warmup and iter should be much higher data(example_data) ard <- example_data$ard subpop_sizes <- example_data$subpop_sizes known_ind <- c(1, 2, 4) N <- example_data$N overdisp.est <- overdispersed(ard, known_sizes = subpop_sizes[known_ind], known_ind = known_ind, G1_ind = 1, G2_ind = 2, B2_ind = 4, N = N, warmup = 50, iter = 100 ) # Compare size estimates data.frame( true = subpop_sizes, basic = colMeans(overdisp.est$sizes) ) # Compare degree estimates plot(example_data$degrees, colMeans(overdisp.est$degrees)) # Look at overdispersion parameter colMeans(overdisp.est$omegas)# Analyze an example ard data set using Zheng et al. (2006) models # Note that in practice, both warmup and iter should be much higher data(example_data) ard <- example_data$ard subpop_sizes <- example_data$subpop_sizes known_ind <- c(1, 2, 4) N <- example_data$N overdisp.est <- overdispersed(ard, known_sizes = subpop_sizes[known_ind], known_ind = known_ind, G1_ind = 1, G2_ind = 2, B2_ind = 4, N = N, warmup = 50, iter = 100 ) # Compare size estimates data.frame( true = subpop_sizes, basic = colMeans(overdisp.est$sizes) ) # Compare degree estimates plot(example_data$degrees, colMeans(overdisp.est$degrees)) # Look at overdispersion parameter colMeans(overdisp.est$omegas)
This function fits the ARD using the Overdispersed model in Stan. The population size estimates and degrees are scaled using a post-hoc procedure. For the Gibbs-Metropolis algorithm implementation, see overdispersed.
overdispersedStan( ard, known_sizes = NULL, known_ind = NULL, G1_ind = NULL, G2_ind = NULL, B2_ind = NULL, N = NULL, chains = 3, cores = 1, warmup = 1000, iter = 1500, thin = 1, return_fit = FALSE, ... )overdispersedStan( ard, known_sizes = NULL, known_ind = NULL, G1_ind = NULL, G2_ind = NULL, B2_ind = NULL, N = NULL, chains = 3, cores = 1, warmup = 1000, iter = 1500, thin = 1, return_fit = FALSE, ... )
ard |
The 'n_i x n_k' matrix of non-negative ARD integer responses, where the '(i,k)th' element corresponds to the number of people that respondent 'i' knows in subpopulation 'k'. |
known_sizes |
The known subpopulation sizes corresponding to a subset of
the columns of |
known_ind |
The indices that correspond to the columns of |
G1_ind |
A vector of indices denoting the columns of 'ard' that correspond to the primary scaling groups, i.e. the collection of rare girls' names in Zheng, Salganik, and Gelman (2006). By default, all known_sizes are used. If G2_ind and B2_ind are not provided, 'C = C_1', so only G1_ind are used. If G1_ind is not provided, no scaling is performed. |
G2_ind |
A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the first secondary scaling groups, i.e. the collection of somewhat popular girls' names. |
B2_ind |
A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the second secondary scaling groups, i.e. the collection of somewhat popular boys' names. |
N |
The known total population size. |
chains |
A positive integer specifying the number of Markov chains. |
cores |
A positive integer specifying the number of cores to use to run the Markov chains in parallel. |
warmup |
A positive integer specifying the total number of samples for each chain (including warmup). Matches the usage in stan. |
iter |
A positive integer specifying the number of warmup samples for each chain. Matches the usage in stan. |
thin |
A positive integer specifying the interval for saving posterior samples. Default value is 1 (i.e. no thinning). |
return_fit |
A logical indicating whether the fitted Stan model should be returned instead of the rstan::extracted and scaled parameters. This is FALSE by default. |
... |
Additional arguments to be passed to stan. |
This function fits the overdispersed NSUM model using the Gibbs-Metropolis algorithm provided in Zheng et al. (2006).
Either the full fitted Stan model if return_fit = TRUE, else a
named list with the estimated parameters extracted using
extract (the default). The estimated parameters are named as
follows, with additional descriptions as needed:
Log degree, if 'scaling = TRUE', else raw alpha parameters
Log prevalence, if 'scaling = TRUE', else raw beta parameters
Inverse of overdispersion parameters
Standard deviation of alphas
Mean of betas
Standard deviation of betas
Overdispersion parameters
If 'scaling = TRUE', the following additional parameters are included:
Mean of log degrees
Degree estimates
Subpopulation size estimates
Zheng, T., Salganik, M. J., and Gelman, A. (2006). How many people do you know in prison, Journal of the American Statistical Association, 101:474, 409–423
# Analyze an example ard data set using Zheng et al. (2006) models # Note that in practice, both warmup and iter should be much higher ## Not run: data(example_data) ard <- example_data$ard subpop_sizes <- example_data$subpop_sizes known_ind <- c(1, 2, 4) N <- example_data$N overdisp.est <- overdispersedStan(ard, known_sizes = subpop_sizes[known_ind], known_ind = known_ind, G1_ind = 1, G2_ind = 2, B2_ind = 4, N = N, chains = 1, cores = 1, warmup = 250, iter = 500 ) # Compare size estimates round(data.frame( true = subpop_sizes, basic = colMeans(overdisp.est$sizes) )) # Compare degree estimates plot(example_data$degrees, colMeans(overdisp.est$degrees)) # Look at overdispersion parameter colMeans(overdisp.est$omegas) ## End(Not run)# Analyze an example ard data set using Zheng et al. (2006) models # Note that in practice, both warmup and iter should be much higher ## Not run: data(example_data) ard <- example_data$ard subpop_sizes <- example_data$subpop_sizes known_ind <- c(1, 2, 4) N <- example_data$N overdisp.est <- overdispersedStan(ard, known_sizes = subpop_sizes[known_ind], known_ind = known_ind, G1_ind = 1, G2_ind = 2, B2_ind = 4, N = N, chains = 1, cores = 1, warmup = 250, iter = 500 ) # Compare size estimates round(data.frame( true = subpop_sizes, basic = colMeans(overdisp.est$sizes) )) # Compare degree estimates plot(example_data$degrees, colMeans(overdisp.est$degrees)) # Look at overdispersion parameter colMeans(overdisp.est$omegas) ## End(Not run)
Plot residuals against fitted values
plot_fitted(ard, model_fit = NULL, resid = c("rqr", "pearson", "surrogate"))plot_fitted(ard, model_fit = NULL, resid = c("rqr", "pearson", "surrogate"))
ard |
ARD matrix (may be needed) |
model_fit |
fitted model |
resid |
the type of residuals to be used |
a ggplot showing fitted values against residuals
Construction Residual (row/column) correlation matrix
residual_correlation(ard_residuals, ard, type = "column")residual_correlation(ard_residuals, ard, type = "column")
ard_residuals |
vector of residuals |
ard |
ard matrix |
type |
type of correlation to use (row or column) |
a ggplot of the specified correlation matrix
Construct heatmap of residuals
residual_heatmap(ard_residuals, ard)residual_heatmap(ard_residuals, ard)
ard_residuals |
a vector (column wise) of estimated residuals |
ard |
an ard matrix |
A ggplot of residual heatmap
compute numerically stable negative binomial rqr
rqr_nbinom_logs(y, size, prob, eps = 1e-12)rqr_nbinom_logs(y, size, prob, eps = 1e-12)
y |
observed value |
size |
size parameter |
prob |
prob parameter |
eps |
precision parameter |
appropriate randomized quantile residual
compute numerically stable Poisson rqr
rqr_pois_logs(y, mu, eps = 1e-12)rqr_pois_logs(y, mu, eps = 1e-12)
y |
observed value |
mu |
mean value of poisson |
eps |
precision parameter |
appropriate randomized quantile residual
This function scales estimates from either the overdispersed model or from the correlated models. Several scaling options are available.
scaling( log_degrees, log_prevalences, scaling = c("all", "overdispersed", "weighted", "weighted_sq"), known_sizes = NULL, known_ind = NULL, Correlation = NULL, G1_ind = NULL, G2_ind = NULL, B2_ind = NULL, N = NULL )scaling( log_degrees, log_prevalences, scaling = c("all", "overdispersed", "weighted", "weighted_sq"), known_sizes = NULL, known_ind = NULL, Correlation = NULL, G1_ind = NULL, G2_ind = NULL, B2_ind = NULL, N = NULL )
log_degrees |
The matrix of estimated raw log degrees from either the overdispersed or correlated models. |
log_prevalences |
The matrix of estimates raw log prevalences from either the overdispersed or correlated models. |
scaling |
An character vector providing the name of scaling procedure should be performed in order to transform estimates to degrees and subpopulation sizes. Scaling options are 'overdispersed', 'all' (the default), 'weighted', or 'weighted_sq' ('weighted' and 'weighted_sq' are only available if 'Correlation' is provided. Further details are provided in the Details section. |
known_sizes |
The known subpopulation sizes corresponding to a subset of
the columns of |
known_ind |
The indices that correspond to the columns of |
Correlation |
The estimated correlation matrix used to calculate scaling weights. Required if 'scaling = weighted' or 'scaling = weighted_sq'. |
G1_ind |
If 'scaling = overdispersed', a vector of indices corresponding to the subpopulations that belong to the primary scaling groups, i.e. the collection of rare girls' names in Zheng, Salganik, and Gelman (2006). By default, all known_sizes are used. If G2_ind and B2_ind are not provided, 'C = C_1', so only G1_ind are used. If G1_ind is not provided, no scaling is performed. |
G2_ind |
If 'scaling = overdispersed', a vector of indices corresponding to the subpopulations that belong to the first secondary scaling groups, i.e. the collection of somewhat popular girls' names. |
B2_ind |
If 'scaling = overdispersed', a vector of indices corresponding to the subpopulations that belong to the second secondary scaling groups, i.e. the collection of somewhat popular boys' names. |
N |
The known total population size. |
The 'scaling' options are described below:
No scaling is performed
The scaling procedure outlined in Zheng et al. (2006) is performed. In this case, at least 'Pg1_ind' must be provided. See overdispersedStan for more details.
All subpopulations with known sizes are used to scale the parameters, using a modified scaling procedure that standardizes the sizes so each population is weighted equally. Additional details are provided in Laga et al. (2021).
All subpopulations with known sizes are weighted according their correlation with the unknown subpopulation size. Additional details are provided in Laga et al. (2021)
Same as 'weighted', except the weights are squared, providing more relative weight to subpopulations with higher correlation.
The named list containing the scaled log degree, degree, log prevalence, and size estimates
Zheng, T., Salganik, M. J., and Gelman, A. (2006). How many people do you know in prison, Journal of the American Statistical Association, 101:474, 409–423
Laga, I., Bao, L., and Niu, X (2021). A Correlated Network Scaleup Model: Finding the Connection Between Subpopulations
Tracy-Widom test for residual group correlation
tw_group_corr_test(model_fit, correction = c("none", "half"), plot = TRUE)tw_group_corr_test(model_fit, correction = c("none", "half"), plot = TRUE)
model_fit |
fitted model object |
correction |
correction constant, either "none", "half" |
plot |
a logical, whether to return a ggplot density plot of TW with observed statistic |
a list containing test statistic, p-value, and diagnostic plots