Datasets used in reviews

The following datasets are used in the reviews.


The data are on 3,435 children who attended 148 primary schools and 19 secondary schools in Scotland. There are 11 fields in the data set of which the following are used.

  1. VRQ: A verbal reasoning score from tests pupils took when they entered secondary school
  2. ATTAIN: Attainment score of pupils at age 16
  3. PID: Primary school identifying code
  4. SEX: Pupil's gender
    0 = boy
    1 = girl
  5. SC: Pupil's social class scale (continuous scale score from low to high social class)
  6. SID: Secondary school identifying code

Model to be fitted: Cross-classified model.


Paterson, L. (1991). Socio economic status and educational attainment: a multidimensional and multilevel study. Evaluation and Research in Education, 5: 97-121.

(Back to top)


These data come from the 1988 Bangladesh Fertility Survey. The file consists of a subsample of 1934 women grouped in 60 districts. The variables are defined as follows:

  1. WOMAN: identifying code for each woman
  2. DISTRICT: identifying code for each district
  3. USE: Contraceptive use at time of survey: Response
    1 = using contraception
    0 = not using contraception
  4. LC: Number of living children at time of survey
    1 = None
    2 = 1
    3 = 2
    4 = 3 or more
  5. AGE: Age of woman at time of survey (in years), centred around mean
  6. URBAN: Type of region of residence
    1 = Urban
    0 = Rural
  7. CONSTANT = 1

Model to be fitted: Two-level main effects model with logistic and other link functions with all covariates.


Huq, N. M., and Cleland, J. (1990). Bangladesh Fertility Survey 1989 (Main Report) Dhaka: National Institute of Population Research and Training.

(Back to top)


This consists of 7 fields in the following order:

  1. ID for Local Education Authorities
  2. ID for schools
  3. ID for individuals
  4. Point score on A-level Chemistry in 1997
  5. Gender
    0 = M
    1 = F
  6. Age in months, centred at 222 months or 18.5 years
  7. Average GCSE score of individual centered at mean

This data set is relatively large with 31,022 individuals in 2,280 schools. Note that while this is used, illustratively, to fit Normal response models, the distribution of the response is not well described by a Normal distribution.

The following models are fitted: Three-level variance component and random slope models for continuous Normal outcomes with main effect interactions between GCSE and gender, or interactions between GCSE and age.


Fielding, A., Yang, M., and Goldstein, H.(2003). Multilevel ordinal models for examination grades. Statistical Modelling, 3 (2): 127-153.

(Back to top)


This set of data consists of 4,059 students from 65 schools in Inner London.

The variables in order are:

  1. School ID
  2. Student ID
  3. Normalised exam score as outcome variable
  4. Constant vector = 1
  5. Standardised LR test score (STANDLRT)
  6. Student gender
    0 = boys
    1 = girls
  7. School gender
    1 = mixed school
    2 = boys school
    3 = girls
  8. School average of intake score
  9. Student level Verbal Reasoning (VR) score band at intake
    1 = bottom 25%
    2 = mid 50%
    3 = top 25%
  10. Band of students intake score
    1 = bottom 25%
    2 = mid 50%
    3 = top 25%

The dataset is used to fit 2-level Normal response models including complex level 1 variation for gender and LRT score.


Goldstein, H., Rasbash, J., et al (1993). A multilevel analysis of school examination results. Oxford Review of Education, 19: 425-433.

(Back to top)


The data contain GCSE exam scores on a science subject. Two components of the exam were chosen as outcome variables: written paper and course work. There are 1,905 students from 73 schools in England. Five fields are as follows.

Missing values are coded as -1.

  1. School ID
  2. Student ID
  3. Gender of student
    0 = boy
    1 = girl
  4. Total score of written paper
  5. Total score of coursework paper

Models fitted are bivariate Normal response models.


Multivariate response models. (2000). In Rasbash, J., et al, A user’s guide to MLwiN, Institute of Education, University of London.

(Back to top)


This data has been used to illustrate Meta-analysis. It was a collection of 19 studies published on the effect of teacher expectancy on pupil IQ. Four fields are in the file:

  1. Effect size estimate
  2. SE of effect size estimate
  3. Weeks of prior contact
  4. Study code

Models to be fitted: 2 - level meta analysis model


Raudenbush, S., Bryk, A. Cheong, YF. and Congdon, R. (2001). HLM 5, Hierarchical Linear and Nonlinear Modelling, SSI, Lincolnwood

(Back to top)


This set of data comes from the study of Malignant Melanoma Mortality in the European Community associated with the impact of UV radiation exposure. Seven fields are included in the dataset.

  1. Nation
    1 = Belgium
    2 = W. Germany
    3 = Denmark
    4 = France
    5 = UK
    6 = Italy
    7 = Ireland
    8 = Luxembourg
    9 = Netherlands
  2. Region ID
  3. County ID
  4. Number of male deaths due to MM during 1971-1980
  5. Number of expected deaths
  6. Constant = 1
  7. Measure of the UVB dose reaching the earth's surface in each county and centered

The following models are fitted to the data:
a. Three-level Poisson model
b. Two-level Poisson model with dummy variables for the level 3 (Nation) units


Langford, I.H., Bentham, G. and McDonald, A. (1998). Multilevel modelling of geographically aggregated health data: a case study on malignant melanoma mortality and UV exposure in the European community. Statistics in Medicine , 17: 41-58.

(Back to top)


Measurements on height of 26 boys were taken on 9 occasions between the ages of 11 and 13 years. There are five fields in the data:

  1. Individual ID
  2. Age in years centred at 12 years
  3. Height in cm
  4. Occasion number
  5. Season in decimal years

The models fitted are: Two level growth curve repeated measures and with serial correlation structure at level 1.


Goldstein, H., Healy, M. J. R., and Rasbash, J. (1994). Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13: 1643-55

(Back to top)


The data come from British Social Attitudes (BSA) Survey started in 1983. The eligible persons were all adults aged 18 or over living in private households in Britain. The data consist of completed results of 264 respondents out of 410. The variables are defined as follows:

  1. District ID
  2. Respondent code (within district)
  3. Year code
    1 = 1983
    2 = 1984
    3 = 1985
    4 = 1986
  4. Number of positive answers to seven questions
    0 = none
    1 = yes for one item
    2 = yes for two items
    3 = yes for three items
    4 = yes for four items
    5 = yes for five items
    6 = yes for six items
    7 = yes for seven items
    Note 0 and 1 combined for analyses.
  5. Party chosen
    1 = Conservative
    2 = Labour
    3 = Lib/SDP/Alliance
    4 = others
    5 = none
  6. Self assessed social class
    1 = middle
    2 = upper working
    3 = lower working
  7. Gender
    1 = male
    2 = female
  8. Age in years
  9. Religion
    1 = Roman Catholic
    2 = Protestant/Church of England
    3 = others
    4 = none

Models fitted are: Multinomial response (ordered or unordered) models with repeated measures from respondents over 4 years.


McGrath, K. and Waterton, J. (1986). British Social Attitudes 1983-1986 panel survey. London, Social and Community Planning Research.

There is a variety of software you can use to un-zip files, eg WinZip or CAM unZip.

Edit this page