Datasets used in reviews

The following datasets are used in the reviews.

2LEV-XC (Cross classified model) - Data on pupil attainment
BANG (Binary response model) - Data on contraceptive use
CHEM97 (2/3-level Normal response models) - This file consists of A/AS-level examination data from England
EXAM (2-level Normal response + complex level 1 variation) - GCSE Examination data from London
GCSEMV (Bivariate Normal response model) - Two examination scores on a set of students, including missing responses
META (Meta analysis model) - Data on teacher expectations of students
MMMEC (Poisson response) - Study of Malignant Melanoma Mortality in Europe
OXBOYS (Normal response repeated measures) - Repeated measures of height of boys
SOCATT (Multicategory response model) - Longitudinal data on social attitudes
Download these datasets (zip, 0.3 mb)‌

2LEV-XC

The data are on 3,435 children who attended 148 primary schools and 19 secondary schools in Scotland. There are 11 fields in the data set of which the following are used.

VRQ: A verbal reasoning score from tests pupils took when they entered secondary school
ATTAIN: Attainment score of pupils at age 16
PID: Primary school identifying code
SEX: Pupil's gender
0 = boy
1 = girl
SC: The social class of the pupil's father's occupation, where 1 is "Professional occupation", 20 is "Intermediate occupation", 31 is "Skilled occupation", and 0 is "Partly skilled, unskilled, or no occupation". The ordering of these categories is 1 > 20 > 31 > 0, and the variable corresponds to the DADRG variable from Paterson (1991).
SID: Secondary school identifying code

Model to be fitted: Cross-classified model.

Reference:

Paterson, L. (1991). Socio economic status and educational attainment: a multidimensional and multilevel study. Evaluation and Research in Education, 5: 97-121.

(Back to top)

BANG

These data come from the 1988 Bangladesh Fertility Survey. The file consists of a subsample of 1934 women grouped in 60 districts. The variables are defined as follows:

WOMAN: identifying code for each woman
DISTRICT: identifying code for each district
USE: Contraceptive use at time of survey: Response
1 = using contraception
0 = not using contraception
LC: Number of living children at time of survey
1 = None
2 = 1
3 = 2
4 = 3 or more
AGE: Age of woman at time of survey (in years), centred around mean
URBAN: Type of region of residence
1 = Urban
0 = Rural
CONSTANT = 1

Model to be fitted: Two-level main effects model with logistic and other link functions with all covariates.

Reference

Huq, N. M., and Cleland, J. (1990). Bangladesh Fertility Survey 1989 (Main Report) Dhaka: National Institute of Population Research and Training.

(Back to top)

CHEM97

This consists of 7 fields in the following order:

ID for Local Education Authorities
ID for schools
ID for individuals
Point score on A-level Chemistry in 1997
Gender
0 = M
1 = F
Age in months, centred at 222 months or 18.5 years
Average GCSE score of individual centered at mean

This data set is relatively large with 31,022 individuals in 2,280 schools. Note that while this is used, illustratively, to fit Normal response models, the distribution of the response is not well described by a Normal distribution.

The following models are fitted: Three-level variance component and random slope models for continuous Normal outcomes with main effect interactions between GCSE and gender, or interactions between GCSE and age.

Reference

Fielding, A., Yang, M., and Goldstein, H.(2003). Multilevel ordinal models for examination grades. Statistical Modelling, 3 (2): 127-153.

(Back to top)

EXAM

This set of data consists of 4,059 students from 65 schools in Inner London.

The variables in order are:

School ID
Student ID
Normalised exam score as outcome variable
Constant vector = 1
Standardised LR test score (STANDLRT)
Student gender
0 = boys
1 = girls
School gender
1 = mixed school
2 = boys school
3 = girls
School average of intake score
Student level Verbal Reasoning (VR) score band at intake
1 = bottom 25%
2 = mid 50%
3 = top 25%
Band of students intake score
1 = bottom 25%
2 = mid 50%
3 = top 25%

The dataset is used to fit 2-level Normal response models including complex level 1 variation for gender and LRT score.

Reference

Goldstein, H., Rasbash, J., et al (1993). A multilevel analysis of school examination results. Oxford Review of Education, 19: 425-433.

(Back to top)

GCSEMV

The data contain GCSE exam scores on a science subject. Two components of the exam were chosen as outcome variables: written paper and course work. There are 1,905 students from 73 schools in England. Five fields are as follows.

Missing values are coded as -1.

School ID
Student ID
Gender of student
0 = boy
1 = girl
Total score of written paper
Total score of coursework paper

Models fitted are bivariate Normal response models.

Reference

Multivariate response models. (2000). In Rasbash, J., et al, A user’s guide to MLwiN, Institute of Education, University of London.

(Back to top)

MMMEC

This set of data comes from the study of Malignant Melanoma Mortality in the European Community associated with the impact of UV radiation exposure. Seven fields are included in the dataset.

Nation
1 = Belgium
2 = W. Germany
3 = Denmark
4 = France
5 = UK
6 = Italy
7 = Ireland
8 = Luxembourg
9 = Netherlands
Region ID
County ID
Number of male deaths due to MM during 1971-1980
Number of expected deaths
Constant = 1
Measure of the UVB dose reaching the earth's surface in each county and centered

The following models are fitted to the data:
a. Three-level Poisson model
b. Two-level Poisson model with dummy variables for the level 3 (Nation) units

Reference

Langford, I.H., Bentham, G. and McDonald, A. (1998). Multilevel modelling of geographically aggregated health data: a case study on malignant melanoma mortality and UV exposure in the European community. Statistics in Medicine , 17: 41-58.

(Back to top)

OXBOYS

Measurements on height of 26 boys were taken on 9 occasions between the ages of 11 and 13 years. There are five fields in the data:

Individual ID
Age in years centred at 12 years
Height in cm
Occasion number
Season in decimal years

The models fitted are: Two level growth curve repeated measures and with serial correlation structure at level 1.

Reference

Goldstein, H., Healy, M. J. R., and Rasbash, J. (1994). Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13: 1643-55

(Back to top)

SOCATT

The data come from British Social Attitudes (BSA) Survey started in 1983. The eligible persons were all adults aged 18 or over living in private households in Britain. The data consist of completed results of 264 respondents out of 410. The variables are defined as follows:

District ID
Respondent code (within district)
Year code
1 = 1983
2 = 1984
3 = 1985
4 = 1986
Number of positive answers to seven questions
0 = none
1 = yes for one item
2 = yes for two items
3 = yes for three items
4 = yes for four items
5 = yes for five items
6 = yes for six items
7 = yes for seven items
Note 0 and 1 combined for analyses.
Party chosen
1 = Conservative
2 = Labour
3 = Lib/SDP/Alliance
4 = others
5 = none
Self assessed social class
1 = middle
2 = upper working
3 = lower working
Gender
1 = male
2 = female
Age in years
Religion
1 = Roman Catholic
2 = Protestant/Church of England
3 = others
4 = none

Models fitted are: Multinomial response (ordered or unordered) models with repeated measures from respondents over 4 years.

Reference

McGrath, K. and Waterton, J. (1986). British Social Attitudes 1983-1986 panel survey. London, Social and Community Planning Research.

There is a variety of software you can use to un-zip files, eg WinZip or CAM unZip.

Datasets used in reviews

2LEV-XC

Reference:

BANG

Reference

CHEM97

Reference

EXAM

Reference

GCSEMV

Reference

META

Reference

MMMEC

Reference

OXBOYS

Reference

SOCATT

Reference