Datasets used in reviews
The following datasets are used in the reviews.
- 2LEV-XC (Cross classified model) - Data on pupil attainment
- BANG (Binary response model) - Data on contraceptive use
- CHEM97 (2/3-level Normal response models) - This file consists of A/AS-level examination data from England
- EXAM (2-level Normal response + complex level 1 variation) - GCSE Examination data from London
- GCSEMV (Bivariate Normal response model) - Two examination scores on a set of students, including missing responses
- META (Meta analysis model) - Data on teacher expectations of students
- MMMEC (Poisson response) - Study of Malignant Melanoma Mortality in Europe
- OXBOYS (Normal response repeated measures) - Repeated measures of height of boys
- SOCATT (Multicategory response model) - Longitudinal data on social attitudes
- Download these datasets (zip, 0.3 mb)
2LEV-XC
The data are on 3,435 children who attended 148 primary schools and 19 secondary schools in Scotland. There are 11 fields in the data set of which the following are used.
- VRQ: A verbal reasoning score from tests pupils took when they entered secondary school
- ATTAIN: Attainment score of pupils at age 16
- PID: Primary school identifying code
- SEX: Pupil's gender
0 = boy
1 = girl - SC: Pupil's social class scale (continuous scale score from low to high social class)
- SID: Secondary school identifying code
Model to be fitted: Cross-classified model.
Reference:
Paterson, L. (1991). Socio economic status and educational attainment: a multidimensional and multilevel study. Evaluation and Research in Education, 5: 97-121.
BANG
These data come from the 1988 Bangladesh Fertility Survey. The file consists of a subsample of 1934 women grouped in 60 districts. The variables are defined as follows:
- WOMAN: identifying code for each woman
- DISTRICT: identifying code for each district
- USE: Contraceptive use at time of survey: Response
1 = using contraception
0 = not using contraception - LC: Number of living children at time of survey
1 = None
2 = 1
3 = 2
4 = 3 or more - AGE: Age of woman at time of survey (in years), centred around mean
- URBAN: Type of region of residence
1 = Urban
0 = Rural - CONSTANT = 1
Model to be fitted: Two-level main effects model with logistic and other link functions with all covariates.
Reference
Huq, N. M., and Cleland, J. (1990). Bangladesh Fertility Survey 1989 (Main Report) Dhaka: National Institute of Population Research and Training.
CHEM97
This consists of 7 fields in the following order:
- ID for Local Education Authorities
- ID for schools
- ID for individuals
- Point score on A-level Chemistry in 1997
- Gender
0 = M
1 = F - Age in months, centred at 222 months or 18.5 years
- Average GCSE score of individual centered at mean
This data set is relatively large with 31,022 individuals in 2,280 schools. Note that while this is used, illustratively, to fit Normal response models, the distribution of the response is not well described by a Normal distribution.
The following models are fitted: Three-level variance component and random slope models for continuous Normal outcomes with main effect interactions between GCSE and gender, or interactions between GCSE and age.
Reference
Fielding, A., Yang, M., and Goldstein, H.(2003). Multilevel ordinal models for examination grades. Statistical Modelling, 3 (2): 127-153.
EXAM
This set of data consists of 4,059 students from 65 schools in Inner London.
The variables in order are:
- School ID
- Student ID
- Normalised exam score as outcome variable
- Constant vector = 1
- Standardised LR test score (STANDLRT)
- Student gender
0 = boys
1 = girls - School gender
1 = mixed school
2 = boys school
3 = girls - School average of intake score
- Student level Verbal Reasoning (VR) score band at intake
1 = bottom 25%
2 = mid 50%
3 = top 25% - Band of students intake score
1 = bottom 25%
2 = mid 50%
3 = top 25%
The dataset is used to fit 2-level Normal response models including complex level 1 variation for gender and LRT score.
Reference
Goldstein, H., Rasbash, J., et al (1993). A multilevel analysis of school examination results. Oxford Review of Education, 19: 425-433.
GCSEMV
The data contain GCSE exam scores on a science subject. Two components of the exam were chosen as outcome variables: written paper and course work. There are 1,905 students from 73 schools in England. Five fields are as follows.
Missing values are coded as -1.
- School ID
- Student ID
- Gender of student
0 = boy
1 = girl - Total score of written paper
- Total score of coursework paper
Models fitted are bivariate Normal response models.
Reference
Multivariate response models. (2000). In Rasbash, J., et al, A user’s guide to MLwiN, Institute of Education, University of London.
META
This data has been used to illustrate Meta-analysis. It was a collection of 19 studies published on the effect of teacher expectancy on pupil IQ. Four fields are in the file:
- Effect size estimate
- SE of effect size estimate
- Weeks of prior contact
- Study code
Models to be fitted: 2 - level meta analysis model
Reference
Raudenbush, S., Bryk, A. Cheong, YF. and Congdon, R. (2001). HLM 5, Hierarchical Linear and Nonlinear Modelling, SSI, Lincolnwood
MMMEC
This set of data comes from the study of Malignant Melanoma Mortality in the European Community associated with the impact of UV radiation exposure. Seven fields are included in the dataset.
- Nation
1 = Belgium
2 = W. Germany
3 = Denmark
4 = France
5 = UK
6 = Italy
7 = Ireland
8 = Luxembourg
9 = Netherlands - Region ID
- County ID
- Number of male deaths due to MM during 1971-1980
- Number of expected deaths
- Constant = 1
- Measure of the UVB dose reaching the earth's surface in each county and centered
The following models are fitted to the data:
a. Three-level Poisson model
b. Two-level Poisson model with dummy variables for the level 3 (Nation) units
Reference
Langford, I.H., Bentham, G. and McDonald, A. (1998). Multilevel modelling of geographically aggregated health data: a case study on malignant melanoma mortality and UV exposure in the European community. Statistics in Medicine , 17: 41-58.
OXBOYS
Measurements on height of 26 boys were taken on 9 occasions between the ages of 11 and 13 years. There are five fields in the data:
- Individual ID
- Age in years centred at 12 years
- Height in cm
- Occasion number
- Season in decimal years
The models fitted are: Two level growth curve repeated measures and with serial correlation structure at level 1.
Reference
Goldstein, H., Healy, M. J. R., and Rasbash, J. (1994). Multilevel time series models with applications to repeated measures data. Statistics in Medicine, 13: 1643-55
SOCATT
The data come from British Social Attitudes (BSA) Survey started in 1983. The eligible persons were all adults aged 18 or over living in private households in Britain. The data consist of completed results of 264 respondents out of 410. The variables are defined as follows:
- District ID
- Respondent code (within district)
- Year code
1 = 1983
2 = 1984
3 = 1985
4 = 1986 - Number of positive answers to seven questions
0 = none
1 = yes for one item
2 = yes for two items
3 = yes for three items
4 = yes for four items
5 = yes for five items
6 = yes for six items
7 = yes for seven items
Note 0 and 1 combined for analyses. - Party chosen
1 = Conservative
2 = Labour
3 = Lib/SDP/Alliance
4 = others
5 = none - Self assessed social class
1 = middle
2 = upper working
3 = lower working - Gender
1 = male
2 = female - Age in years
- Religion
1 = Roman Catholic
2 = Protestant/Church of England
3 = others
4 = none
Models fitted are: Multinomial response (ordered or unordered) models with repeated measures from respondents over 4 years.
Reference
McGrath, K. and Waterton, J. (1986). British Social Attitudes 1983-1986 panel survey. London, Social and Community Planning Research.
There is a variety of software you can use to un-zip files, eg WinZip or CAM unZip.