How to Use Factors in Statistical Models in R

How to Use Factors in Statistical Models in R ?

Answer

To use factors in statistical models in R, you typically include them as predictor variables in your model formula. R automatically handles factors appropriately, creating dummy variables for each level of the factor, except the reference level. This allows you to fit models such as linear regression or generalized linear models that can incorporate categorical data.

✐ Examples

1 Using Factors in a Linear Regression Model

In this example,

We start by creating a data frame named data which contains two columns: height (numeric) and group (factor). The height column represents the height of individuals, and the group column represents different groups ('A', 'B', 'C').
Next, we use the factor() function to ensure the group column is treated as a factor. This step is crucial to let R know that group is categorical data.
We then fit a linear regression model using the lm() function. The model predicts height using group as the predictor variable. We assign the result to a variable named model.
We use the summary() function to print the summary of the model to the console. This summary includes information about the coefficients, standard errors, and significance levels, allowing us to interpret the effect of the factor levels on the response variable.

R Program

data <- data.frame(height = c(160, 170, 165, 175, 180, 169, 172, 178),
                    group = c('A', 'A', 'B', 'B', 'C', 'C', 'A', 'B'))
data$group <- factor(data$group)
model <- lm(height ~ group, data = data)
summary(model)

Output

Call:
lm(formula = height ~ group, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.0000 -3.5000  0.0000  3.5000  6.0000 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  167.333      2.309   72.48 4.18e-07 ***
groupB        -1.333      3.266   -0.41    0.697    
groupC        5.000      3.266    1.53    0.184    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.905 on 5 degrees of freedom
Multiple R-squared:  0.4194, Adjusted R-squared:  0.2153 
F-statistic: 2.408 on 2 and 5 DF,  p-value: 0.1837

2 Using Factors in a Logistic Regression Model

In this example,

We start by creating a data frame named data which contains two columns: outcome (binary factor) and treatment (factor). The outcome column represents a binary outcome (0 or 1), and the treatment column represents different treatment groups ('Placebo', 'DrugA', 'DrugB').
Next, we use the factor() function to ensure both outcome and treatment columns are treated as factors. This step is crucial to let R know that these columns contain categorical data.
We then fit a logistic regression model using the glm() function. The model predicts outcome using treatment as the predictor variable and specifies the family as binomial. We assign the result to a variable named model.
We use the summary() function to print the summary of the model to the console. This summary includes information about the coefficients, standard errors, and significance levels, allowing us to interpret the effect of the treatment levels on the outcome.

R Program

data <- data.frame(outcome = c(0, 1, 1, 0, 1, 0, 0, 1),
                    treatment = c('Placebo', 'DrugA', 'DrugB', 'Placebo', 'DrugA', 'DrugB', 'Placebo', 'DrugA'))
data$outcome <- factor(data$outcome)
data$treatment <- factor(data$treatment)
model <- glm(outcome ~ treatment, data = data, family = binomial)
summary(model)

Output

Call:
glm(formula = outcome ~ treatment, family = binomial, data = data)

Deviance Residuals: 
       1         2         3         4         5         6         7         8  
 0.00000  -1.09861   1.09861   0.00000  -1.09861   1.09861   0.00000  -1.09861  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    0.000      1.225   0.000    1.000
treatmentDrugA 0.000      1.732   0.000    1.000
treatmentDrugB 0.000      1.732   0.000    1.000

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 11.0902  on 7  degrees of freedom
Residual deviance: 11.0902  on 5  degrees of freedom
AIC: 17.090

Number of Fisher Scoring iterations: 4

3 Using Factors in an Analysis of Variance (ANOVA)

In this example,

We start by creating a data frame named data which contains two columns: score (numeric) and group (factor). The score column represents test scores, and the group column represents different experimental groups ('Control', 'Treatment1', 'Treatment2').
Next, we use the factor() function to ensure the group column is treated as a factor. This step is crucial to let R know that group is categorical data.
We then fit an ANOVA model using the aov() function. The model predicts score using group as the predictor variable. We assign the result to a variable named model.
We use the summary() function to print the summary of the ANOVA model to the console. This summary includes information about the F-statistic, degrees of freedom, and p-values, allowing us to interpret the effect of the group levels on the scores.

R Program

data <- data.frame(score = c(85, 88, 90, 78, 80, 83, 79, 77, 82),
                    group = c('Control', 'Control', 'Control', 'Treatment1', 'Treatment1', 'Treatment1', 'Treatment2', 'Treatment2', 'Treatment2'))
data$group <- factor(data$group)
model <- aov(score ~ group, data = data)
summary(model)

Output

             Df Sum Sq Mean Sq F value Pr(>F)  
group         2  58.22  29.111   7.692 0.0213 *
Residuals     6  22.75   3.792                
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Summary

In this tutorial, we learned How to Use Factors in Statistical Models in R language with well detailed examples.

◀ Previous Next ▶

R Tutorials

R How-Tos

How to Use Factors in Statistical Models in R