Categorical variables

How do I update the category labels for a categorical variable?

Highlight the variable in the Names window and press the Toggle Categorical button at the top of the window twice. This switches the variable to continuous and back to categorical, and when it is switched back to categorical the category names are re-created. (If you have specific names you want to give the categories you will need to re-enter these by selecting View in the Categories group). The variable will now have one category per distinct value (and any categories which had no observations will have disappeared).

The commands to do this are CATN 0 C (changes the variable C from categorical to continuous) and NTOC C (changes the variable from continuous to categorical, assigning default category labels to the codes). To give specific labels to the categories, instead of NTOC use CATN as described below.

An alternative is to use the command CATN to just change the categories that are wrong. An unwanted category (usually a category with no observations) can be removed by using CATN 0 C N, where C is the name of the column containing the categorical variable and N is the number of an unwanted category. This command removes category number N from the list of categories for the variable. For example, if you are working with the tutorial dataset and have deleted all records for which vrband = vb3, you could type CATN 0 'vrband' 3 and vrband would then have two categories, vb1 and vb2. To assign a category label to a code, use CATN 1 C N name, where C is again the name of a column containing the categorical variable, N is the code that you wish to assign a label, and name is the label you wish to assign. For example, suppose you have recoded vrband so that all the observations that were in category 3 now have code 4, but you still wish these observations to be in category vb3. (This is not something that in general it would be useful to do; this example is only for illustration of how the command works). You would type CATN 0 'vrband' 3 to remove the category with code 3, and then CATN 1 'vrband' 4 'vb3' to assign the label vb3 to the category with code 4. Note that you can change several categories at once. For example to assign the label vb3 to code 4 and vb2 to code 5, you can use CATN 1 'vrband' 4 'vb3' 5 'vb2'.

For documentation of CATN, see p18 of the Command Manual, and for documentation of NTOC see p22.

I've recoded a categorical variable, but the list of category codes I see when I press Categories has not been updated

When you recode a categorical variable, the category code information is not automatically updated. So if you recode a variable so as to collapse 4 categories into 3, the variable will still be considered to have 4 categories (though one will have no observations); or if you recode all observations in category 3 to have the value 10, and you do not already have a category with code 10, then category 3 will still have its original code (and will have no observations) and observations with code 10 will not be considered to belong to any category. In order to update the category information after recoding, follow the procedure described in the FAQ above : How do I update the category labels for a categorical variable?

How do I get rid of an unneeded dummy variable/ response category?

Example question: I entered a categorical variable into my model as an explanatory variable, and although there are no observations in one of the categories of this variable, a dummy variable for this category was still included in the model. How can I remove it without removing all the dummy variables for this categorical variable?

When you enter a categorical variable as an explanatory variable (or as the response of a multinomial model), MLwiN will enter in all the categories of the explanatory variable even if some categories have no observations. The situation where some categories of a variable have no observations may occur if the variable had some categories with no observations in the original dataset you imported it from (perhaps a Stata or SPSS worksheet), or if you have recoded the variable so that some categories are combined (for example if you recode so that all observations in Category 3 are now in Category 2), or if you have deleted all records in that category of the variable from the dataset.

An unneeded dummy explanatory variable can be removed from the model as described in the FAQ How can I constrain fixed parameters to zero?. Alternatively, before entering it into the model, you can update the category labels as described in the FAQ How do I update the category labels for a categorical variable? Now when you enter the variable into a model (whether as an explanatory variable or as the response of a multinomial model), only the new set of categories will be entered: any of the categories which had no observations will no longer be included. Note that if after following this procedure you recode the variable so that again some categories have no observations, or alter the data in any other way so that this happens, the categories with no observations will not automatically disappear: you will need to update the category labels again.

A final possibility, if you know the code of the category which has no observations, is to use CATN to remove just the category/categories with no observations. How to do this is described in the FAQ How do I update the category labels for a categorical variable?; alternatively see p18 of the Command Manual. Since this removes the category from the variable, it will mean both that no dummy for that category is included when the categorical variable is entered as an explanatory variable and that the category will not be included when the categorical variable is used as the response of a multinomial model.

Other questions about categorical variables

Go to FAQ: How do I test if an interaction between a continuous and a categorical variable is significant?

(Back to top)

Edit this page