******************************************************************************** * runmixregls - A PROGRAM TO RUN THE MIXREGLS MIXED-EFFECTS LOCATION SCALE * SOFTWARE FROM WITHIN STATA * * George Leckie * Centre for Multilevel Modelling * University of Bristol * * http://www.bristol.ac.uk/cmm/software/runmixregls/ ******************************************************************************** * MIXREGLS is developed by Hedeker and Nordgren (2013). * * Hedeker, D. R. and Nordgren. 2013. MIXREGLS: A Program for Mixed-effects * Location Scale Analysis. Journal of Statistical Software, 52, 12, 1-38. * URL: http://www.jstatsoft.org/v52/i12. ******************************************************************************** * This do-file differs slightly from that released with the JSS article. * We have improved runmixregls over time and that has required are to make * a few some edits to this do-file to make sure it still replicates the article * We document these changes below. ******************************************************************************** * SECTION 1. Introduction ******************************************************************************** ******************************************************************************** * SECTION 3. Review of the mixed-effects location scale model ******************************************************************************** ******************************************************************************** * SECTION 3. Installing runmixregls ******************************************************************************** * Install the runmixregls command from the Statistical Software Components * (SSC) archive. You only need to run this command once. //ssc install runmixregls * If you have already installed runmixregls from the SSC, you can check that * you are using the latest version by typing the following command. * Uncomment this command if you wish to run it (i.e. remove the //). //. adoupdate runmixregls ******************************************************************************** * SECTION 4. Syntax diagram ******************************************************************************** * View the runmixregls help file help runmixregls ******************************************************************************** * SECTION 5. Example 1 - Reisby depression study ******************************************************************************** *------------------------------------------------------------------------------- * SECTION 5.1. Data *------------------------------------------------------------------------------- * Load the Reisby data into memory use "http://www.bristol.ac.uk/cmm/media/runmixregls/reisby", clear // In contrast to the JSS article, here we refer to the version of the data // saved on the web. * Describe the data contents codebook, compact * Recode missing values from -9 to Stata's system missing value recode hamdep (-9 = .) * Figure 1 - Spaghetti plot of depression trajectories plotted separately * for non-endogenous (left panel) and endogenous subjects (right panel) twoway (line hamdep week, connect(ascending)), xlabel(0(1)5) by(endog) *------------------------------------------------------------------------------- * SECTION 5.2. Model *------------------------------------------------------------------------------- * Declare the data to be panel data xtset id * Fit the model runmixregls hamdep week endog endweek, /// between(endog) /// within(week endog) *------------------------------------------------------------------------------- * SECTION 5.3. Random effects ... *------------------------------------------------------------------------------- * Refit the model specifying the reffects() and residuals() options to * retrieve the standardized empirical Bayes random effects and residuals, * respectively runmixregls hamdep week endog endweek, /// between(endog) /// within(week endog) /// reffects(theta) /// residuals(estd) // In the original implementation of runmixregls as illustrated in the JSS // article, we specified the option reffects(theta1 theta2) to retrieve the // random location and scale effects. In the current implementation of // runmixregls, we have replaced this with reffects(theta) where runmixregls // will automatically name the two random effect variables theta0 and theta1. * Rename the random location and scale effect to match the names given in the JRSSA article rename theta1 theta2 rename theta0 theta1 * Fill in any missing values of theta1 and theta2 bysort id (theta1): replace theta1 = theta1[1] bysort id (theta2): replace theta2 = theta2[1] // In the original implementation of runmixregls, these will have been filled in // automatically which is why there is no reference to this step in the JSS // article * Manually generate the standardized residuals as this option is not working in // the current implementation of runmixgels. * First drop the original variable drop estd * Then save the predicted location generate xb = _b[Mean:_cons] + _b[Mean:week]*week + _b[Mean:endog]*endog + _b[Mean:endweek]*endweek * Next save the predicted between-subject random location variance generate bsvar = exp(_b[Between:_cons] + _b[Between:endog]*endog) * Then, save the within-subject residual variance generate wsvar = exp(_b[Within:_cons] + _b[Within:week]*week + _b[Within:endog]*endog + _b[Association:linear]*theta1 + _b[Scale:sigma]*theta2) * Last calculate the residual by subtraction and standardize it by dividing * through by the within-subject residual variance generate estd = (hamdep - xb - sqrt(bsvar)*theta1)/sqrt(wsvar) * Figure 2 - Histogram of standardized model residuals histogram estd, width(0.5) start(-3) frequency * Figure 3 - Bivariate scatter plot of estimated random-location and -scale * effects scatter theta2 theta1, mlabel(id) * Keep only those variables needed in subsequent tables keep id hamdep week theta1 theta2 * Rename the response variable rename hamdep hd * Reshape the data from "long form" to "wide form" reshape wide hd, i(id) j(week) * Set the display precision of theta1 and theta2 to three decimal places format %4.3f theta1 theta2 * Sort the data in descending order by theta2 gsort -theta2 * List the two subjects with the highest scale estimates list id theta2 hd? if inlist(_n, 1, 2) * List the two subjects with the lowest scale estimates list id theta2 hd? if inlist(_n, 65, 66) * Sort the data in ascending order by theta2 sort theta2 * List the two subjects in the bottom left corner of Figure 2 list id theta1 theta2 hd? if inlist(id, 117, 347) * List the subject in the bottom right corner of Figure 2 list id theta1 theta2 hd? if inlist(id, 345) * List the subject in the top left corner of Figure 2 list id theta1 theta2 hd? if inlist(id, 505) * List the four subjects in the top right corner of Figure 2 list id theta1 theta2 hd? if inlist(id, 607, 322, 328, 360) *------------------------------------------------------------------------------- * SECTION 5.4. Additional analyses... *------------------------------------------------------------------------------- * Reload the Reisby data into memory use http://www.bristol.ac.uk/cmm/media/runmixregls/reisby, clear // In contrast to the JSS article, here we refer to the version of the data // saved on the web. * Recode missing values from -9 to Stata's system missing value recode hamdep (-9 = .) * Declare the data to be panel data xtset id * Refit the model, removing the non-significant association between the WS * variance and the random-location effects and increasing the number of * integration points to 15 runmixregls hamdep week endog endweek, /// between(endog) /// within(week endog) /// association(none) /// intpoints(15) * List the estimation results saved by runmixregls ereturn list * List the stage 2 parameter estimates matrix list e(b_2) * Make a copy of the stage 2 parameter estimates, e(b_2) matrix b_2 = e(b_2) * Make a copy of the stage 2 variance-covariance matrix, e(V_2) matrix V_2 = e(V_2) * Store the stage 2 coefficient vector and variance-covariance matrix ereturn post b_2 V_2 * Display the stage 2 estimates table ereturn display ******************************************************************************** * SECTION 6. Example 2 - Positive mood study ******************************************************************************** *------------------------------------------------------------------------------- * SECTION 6.1. Data *------------------------------------------------------------------------------- * Load the Posmood data into memory use http://www.bristol.ac.uk/cmm/media/runmixregls/posmood, clear // In contrast to the JSS article, here we refer to the version of the data // saved on the web. * Describe the data contents codebook, compact * Generate a unique observation identifier within each subject bysort subject: gen observation = _n * Label the new variable label var observation "Observation" * Declare the data to be panel data xtset subject observation * Figure 4 - Spaghetti plot of positive mood scores plotted for the first * 20 subjects xtline posmood if subject<=20, byopts(style(compact)) *------------------------------------------------------------------------------- * SECTION 6.2. Model 1 *------------------------------------------------------------------------------- * Fit Model 1 runmixregls posmood alone genderf, /// between(alone genderf) /// within(alone genderf) * Store the estimation results estimates store model1 * Generate subject group indicator variable generate group = . replace group = 1 if genderf==0 & alone==0 replace group = 2 if genderf==0 & alone==1 replace group = 3 if genderf==1 & alone==0 replace group = 4 if genderf==1 & alone==1 * Define value labels for the variable group label define grouplabel 1 "Male, not alone" /// 2 "Male, alone" /// 3 "Female, not alone" /// 4 "Female, alone" * Attach these value labels to the variable group label values group grouplabel * Predict the BS variance for each subject predictnl BSvariance = exp(xb(Between)) * Tabulate the BS variance by subject group tabstat BSvariance, by(group) nototal format(%4.3f) * Redisplay the current (active) estimation results, but present the * parameter names rather than displaying the statistics for these * coefficients. runmixregls, coeflegend * Predict the WS variance for each subject predictnl WSvariance = exp(xb(Within) /// + 0.5 * (_b[Association:linear]^2 + _b[Scale:sigma]^2)) * Tabulate the WS variance by subject group tabstat WSvariance, by(group) nototal format(%4.3f) * Predict the ICC for each subject predictnl ICC = exp(xb(Between))/(exp(xb(Between)) + exp(xb(Within) /// + 0.5 * (_b[Association:linear]^2 + _b[Scale:sigma]^2))) * Tabulate the ICC by subject group tabstat ICC, by(group) nototal format(%4.3f) *------------------------------------------------------------------------------- * SECTION 6.3. Model 2 *------------------------------------------------------------------------------- * Generate the interaction between alone and genderf generate algenf = alone * genderf * Fit Model 2 runmixregls posmood alone genderf algenf, /// between(alone genderf algenf) /// within(alone genderf algenf) * Store the estimation results estimates store model2 * Perform a likelihood ratio test comparing Model 1 and 2 lrtest model1 model2 * Predict the gender difference in mean mood when subjects are alone lincom _b[Mean:genderf] + _b[Mean:algenf] *------------------------------------------------------------------------------- * SECTION 6.4. Model 3 *------------------------------------------------------------------------------- * Fit Model 3 runmixregls posmood alone genderf algenf, /// between(alone genderf algenf) /// within(alone genderf algenf) /// association(quadratic) * Store the estimation results estimates store model3 * Perform a likelihood ratio test comparing Model 2 and 3 lrtest model2 model3 *------------------------------------------------------------------------------- * SECTION 6.5. Additional analyses... *------------------------------------------------------------------------------- * Figure 5 - Predicted WS variance functions plotted against random-location * effect, separately for each subject group twoway /// (function exp(_b[Within:_cons] /// + _b[Association:linear] * x + _b[Association:quadratic] * x^2 /// + 0.5 * (_b[Scale:sigma]^2)), range(-3 3)) /// (function exp(_b[Within:_cons] + _b[Within:alone] /// + _b[Association:linear] * x + _b[Association:quadratic] * x^2 /// + 0.5 * (_b[Scale:sigma]^2)), range(-3 3)) /// (function exp(_b[Within:_cons] + _b[Within:genderf] /// + _b[Association:linear] * x + _b[Association:quadratic] * x^2 /// + 0.5 * (_b[Scale:sigma]^2)), range(-3 3)) /// (function exp(_b[Within:_cons] + _b[Within:alone] /// + _b[Within:genderf] + _b[Within:algenf] /// + _b[Association:linear] * x + _b[Association:quadratic] * x^2 /// + 0.5 * (_b[Scale:sigma]^2)), range(-3 3)), /// ytitle(WS variance) /// xtitle(Random-location effect) /// legend(order(1 "Male, not alone" /// 2 "Male, alone" /// 3 "Female, not alone" /// 4 "Female, alone")) ******************************************************************************** * SECTION 7. Simulation study ******************************************************************************** * Clear existing data and set initial random number seed for replication * purposes clear set seed 21561561 * Perform 1000 replications. Note, edit the 1/10 to 1/1000 to perform the * full 1000 replications. We have reduced it to 10 replications here for * illustrative pursposes as it takes a long time to run the full 1000 * replications. forvalues r = 1/10 { * Display current replication number display "Replication = `r'" * Clear any existing data from memory clear * Specify the new dataset to have 250 records, one for each subject set obs 250 * Generate a subject identifier variable generate subject = _n * Generate the single subject-level covariate generate z = rnormal(0, 1) * Generate the standardized random-location effects generate theta1 = rnormal(0, 1) * Generate the standardized random-scale effects generate theta2 = rnormal(0, 1) * Expand the data from one record per subject to ten records per subject, * one for each observation. expand 10 * Generate the observation identifier variable bysort subject: generate obs = _n * Generate the single observation-level covariate generate x = rnormal(0, 1) * Generate the BS standard deviation for the random-location effects generate sigma_v = sqrt(exp(-2.3)) * Generate the BS standard deviation for the random-scale effects generate sigma_w = 0.45 * Generate the WS variance generate sigma2_e = exp(-0.61 + 0.1 * x + 0.1 * z + sigma_w * theta2) * Generate the residual errors generate e = rnormal(0, sqrt(sigma2_e)) * Generate the response variable generate y = 0 + 0.5 * x + 0.2 * z + sigma_v * theta1 + e * Declare the data to be a panel xtset subject * Fit the mixed-effect location scale model runmixregls y x z, within(x z) association(none) * Clear the existing data from memory clear * Changes the number of records in the dataset to eight, one for each * parameter set obs 8 * Generate a parameter identifier generate byte parameter = _n * Create a copy of the stage 2 coefficient vector matrix b_2 = e(b_2)' * Create a copy of the stage 2 variance-covariance matrix matrix V_2 = e(V_2) * Create a copy of the stage 3 coefficient vector matrix b_3 = e(b_3)' * Create a copy of the stage 3 variance-covariance matrix matrix V_3 = e(V_3) * Create a column vector of the stage 2 parameter standard errors as the * square roots of the leading diagonal of its variance-covariance matrix. matrix se_2 = vecdiag(cholesky(diag(vecdiag(V_2))))' * Create a column vector of the stage 3 parameter standard errors matrix se_3 = vecdiag(cholesky(diag(vecdiag(V_3))))' * Store stage 2 coefficient vector as a new variable svmat b_2 * Store stage 2 standard error vector as a new variable svmat se_2 * Store stage 3 coefficient vector as a new variable svmat b_3 * Store stage 3 standard error vector as a new variable svmat se_3 * Remove the suffix 1 suffix from the newly created variable names rename (*1) (*) * Set the display precision of the four variables to three decimal places format %4.3f b_2 se_2 b_3 se_3 * List the dataset of stage 2 and stage 3 model results list, abbreviate(9) separator(0) * Save the model results to disk, saving the replication number in the * filename save "rep_`r'.dta", replace } * Clear the existing data from memory clear * Generate a new variable to hold the replication number generate rep = . * Append the 1000 replications. Note, edit 1/10 to 1/1000 to append together * the full 1000 replications. You must have first run the first loop for * 1000 replications. forvalues r = 1/10 { * Append the model results of the current replication to all previous * replications append using "rep_`r'" * Replace missing values of the replication identifier with the current * replication number replace rep = `r' if rep==. } * Generate a new variable which holds the true value associated with each * parameter generate truevalue = . replace truevalue = 0.5 if parameter==1 replace truevalue = 0.2 if parameter==2 replace truevalue = 0.0 if parameter==3 replace truevalue = -2.3 if parameter==4 replace truevalue = 0.1 if parameter==5 replace truevalue = 0.1 if parameter==6 replace truevalue = -0.61 if parameter==7 replace truevalue = 0.45 if parameter==8 * Generate the percentage bias for each parameter generate b_2_bias = 100 * (b_2 - truevalue)/truevalue generate b_3_bias = 100 * (b_3 - truevalue)/truevalue * Set the display precision of the percentage bias variables to be to the * nearest integer format %3.0f b_2_bias b_3_bias * Generate coverage indicators for whether the true values are contained * within the 95% confidence intervals generate b_2_cov = 100 * (inrange(truevalue, b_2 - invnorm(0.975) * se_2, /// b_2 + invnorm(0.975) * se_2)) generate b_3_cov = 100 * (inrange(truevalue, b_3 - invnorm(0.975) * se_3, /// b_3 + invnorm(0.975) * se_3)) * Set the display precision of the coverage indicator variables to be to * the nearest integer format %3.0f b_2_cov b_3_cov * Average the model results for each parameter over the 100 replications collapse (mean) b_2 se_2 b_3 se_3 b_2_bias b_3_bias b_2_cov b_3_cov, /// by(parameter truevalue) * Define value labels for the parameter identifier label define parlabel 1 "beta_1" 2 "beta_2" 3 "beta_0" 4 "alpha_0" /// 5 "tau_1" 6 "tau_2" 7 "tau_0" 8 "sigma_w" * Attach these value labels to the parameter identifier label values parameter parlabel * List the averaged parameter estimates and the percentage bias statistics list parameter truevalue b_2 b_3 b_2_bias b_3_bias, /// noobs abbreviate(9) separator(0) * List the averaged standard errors list parameter truevalue se_2 se_3, noobs abbreviate(9) separator(0) * List the coverage rates list parameter truevalue b_2_cov b_3_cov, noobs abbreviate(9) separator(0) ******************************************************************************** * SECTION 8. Conclusions ******************************************************************************** ******************************************************************************** exit