To use factors in statistical models in R, you typically include them as predictor variables in your model formula. R automatically handles factors appropriately, creating dummy variables for each level of the factor, except the reference level. This allows you to fit models such as linear regression or generalized linear models that can incorporate categorical data.
In this example,
data
which contains two columns: height
(numeric) and group
(factor). The height
column represents the height of individuals, and the group
column represents different groups ('A', 'B', 'C').factor()
function to ensure the group
column is treated as a factor. This step is crucial to let R know that group
is categorical data.lm()
function. The model predicts height
using group
as the predictor variable. We assign the result to a variable named model
.summary()
function to print the summary of the model to the console. This summary includes information about the coefficients, standard errors, and significance levels, allowing us to interpret the effect of the factor levels on the response variable.data <- data.frame(height = c(160, 170, 165, 175, 180, 169, 172, 178),
group = c('A', 'A', 'B', 'B', 'C', 'C', 'A', 'B'))
data$group <- factor(data$group)
model <- lm(height ~ group, data = data)
summary(model)
Call: lm(formula = height ~ group, data = data) Residuals: Min 1Q Median 3Q Max -6.0000 -3.5000 0.0000 3.5000 6.0000 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 167.333 2.309 72.48 4.18e-07 *** groupB -1.333 3.266 -0.41 0.697 groupC 5.000 3.266 1.53 0.184 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.905 on 5 degrees of freedom Multiple R-squared: 0.4194, Adjusted R-squared: 0.2153 F-statistic: 2.408 on 2 and 5 DF, p-value: 0.1837
In this example,
data
which contains two columns: outcome
(binary factor) and treatment
(factor). The outcome
column represents a binary outcome (0 or 1), and the treatment
column represents different treatment groups ('Placebo', 'DrugA', 'DrugB').factor()
function to ensure both outcome
and treatment
columns are treated as factors. This step is crucial to let R know that these columns contain categorical data.glm()
function. The model predicts outcome
using treatment
as the predictor variable and specifies the family as binomial
. We assign the result to a variable named model
.summary()
function to print the summary of the model to the console. This summary includes information about the coefficients, standard errors, and significance levels, allowing us to interpret the effect of the treatment levels on the outcome.data <- data.frame(outcome = c(0, 1, 1, 0, 1, 0, 0, 1),
treatment = c('Placebo', 'DrugA', 'DrugB', 'Placebo', 'DrugA', 'DrugB', 'Placebo', 'DrugA'))
data$outcome <- factor(data$outcome)
data$treatment <- factor(data$treatment)
model <- glm(outcome ~ treatment, data = data, family = binomial)
summary(model)
Call: glm(formula = outcome ~ treatment, family = binomial, data = data) Deviance Residuals: 1 2 3 4 5 6 7 8 0.00000 -1.09861 1.09861 0.00000 -1.09861 1.09861 0.00000 -1.09861 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.000 1.225 0.000 1.000 treatmentDrugA 0.000 1.732 0.000 1.000 treatmentDrugB 0.000 1.732 0.000 1.000 (Dispersion parameter for binomial family taken to be 1) Null deviance: 11.0902 on 7 degrees of freedom Residual deviance: 11.0902 on 5 degrees of freedom AIC: 17.090 Number of Fisher Scoring iterations: 4
In this example,
data
which contains two columns: score
(numeric) and group
(factor). The score
column represents test scores, and the group
column represents different experimental groups ('Control', 'Treatment1', 'Treatment2').factor()
function to ensure the group
column is treated as a factor. This step is crucial to let R know that group
is categorical data.aov()
function. The model predicts score
using group
as the predictor variable. We assign the result to a variable named model
.summary()
function to print the summary of the ANOVA model to the console. This summary includes information about the F-statistic, degrees of freedom, and p-values, allowing us to interpret the effect of the group levels on the scores.data <- data.frame(score = c(85, 88, 90, 78, 80, 83, 79, 77, 82),
group = c('Control', 'Control', 'Control', 'Treatment1', 'Treatment1', 'Treatment1', 'Treatment2', 'Treatment2', 'Treatment2'))
data$group <- factor(data$group)
model <- aov(score ~ group, data = data)
summary(model)
Df Sum Sq Mean Sq F value Pr(>F) group 2 58.22 29.111 7.692 0.0213 * Residuals 6 22.75 3.792 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In this tutorial, we learned How to Use Factors in Statistical Models in R language with well detailed examples.