Case Study: Reproducing Goldstein et al. (1993)

This case study - conducted as part of the ESRC grant The use of interactive electronic-books in the teaching and application of modern quantitative methods in the social sciences - was undertaken with the kind involvement of Harvey Goldstein.

Harvey is Professor of Social Statistics at the School of Education, in the University of Bristol, as well as Professor of Statistics at the UCL Institute of Child health, and Visiting Professor at London School of Hygiene and Tropical Medicine, as well as at the University of East Anglia.

We explored the extent to which a previously published paper - Goldstein, H., Rasbash, J., Yang, M., Woodhouse, G., Pan, H., Nuttall, D., and Thomas, S. (1993). A Multilevel Analysis of School Examination Results. Oxford Review of Education, 19(4), 425 - 433 - could be replicated using the Stat-JR:DEEP eBook interface.

The resulting eBook can be downloaded via the following link:

Reproducing Goldstein et al 93 (zip, 524 kb)

This contains the eBook, and a supporting Stat-JR template (the other templates called by the eBook are all provided as part of the standard Stat-JR release); for a flavour of the content of this eBook, please see the screenshot below.

As the eBook demonstrates, we were able to reproduce the models that Goldstein et al investigated, albeit with a subset of the original dataset, available from the UK Data Service (SN: 4043 Sample of GCSE Examination Results for Pupils from London Schools, 1990). Therefore, to run the eBook, you will need to download this dataset from the UK Data Service, having first registered and gained the necessary permissions to do so, as per their terms and conditions.

The original paper analysed the GCSE exam scores (which pupils usually take when 16 years of age) of 5,748 students across 66 schools in Inner London, using multilevel models with student nested within school.

The subset we use consists of 4,059 pupils, across 65 schools - it is therefore similar to the 'tutorial' dataset available as a sample dataset with MLwiN, although it additionally includes mathematics and English exam scores. Unlike the dataset analysed in Goldstein et al (1993), however, it does not have school religious denomination (which Goldstein et al included as an explanatory variable in their first model), so we do not include that in our analyses.

We also excluded the variable prcenfs; in the documentation accompanying the dataset this is described as the percentge of children in a school on free school meals, however it consists only of the values 1, 2, 3, and 4 and so appears unlikely to be the actual percentage. Given it is not clear how this variable is coded, and mindful of the errors in the school identifier variable (see below), we decided to drop this variable.

There additionally appeared to be an error with regard to the indexing of the school identifier in the dataset available from the UK Data Service: it consisted of only 9 unique IDs, with the gender of the school changing within the same contiguous ID (i.e. in a manner which cannot be correct). Since, in other respects, the other rows looked the same (or very similar: they may have been standardized with reference to a larger sample, for example; i.e. the data in the other columns and rows suggest that, by row number, the same school IDs would apply), to those in MLwiN's 'tutorial' dataset, we copied over the school ID column from that.

goldstein et al 93 ebook screenshot

Edit this page