To use factors in regression analysis in R, you need to convert the categorical variables into factors and include them in your regression model. This allows R to treat these variables correctly in the analysis, creating appropriate dummy variables for the regression equation.
In this example,
data
that includes the variables income
and gender
. The income
variable is numeric, while the gender
variable is categorical with values 'Male'
and 'Female'
.gender
variable to a factor using the factor()
function. This ensures that R treats the gender variable as a categorical variable in the regression analysis.lm()
function to create a linear regression model with income
as the dependent variable and gender
as the independent variable. We assign the result to a variable named model
.summary()
function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of the gender variable on income.data <- data.frame(income = c(50000, 60000, 55000, 65000, 70000), gender = c('Male', 'Female', 'Female', 'Male', 'Female'))
data$gender <- factor(data$gender)
model <- lm(income ~ gender, data = data)
summary(model)
Call: lm(formula = income ~ gender, data = data) Residuals: 1 2 3 4 5 -5000 3000 -2500 4500 0 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 60000 2357.02 25.452 0.00155 ** genderFemale -5000 3333.33 -1.500 0.24118 Residual standard error: 3810 on 3 degrees of freedom Multiple R-squared: 0.4286, Adjusted R-squared: 0.2381 F-statistic: 2.25 on 1 and 3 DF, p-value: 0.2412
In this example,
data
that includes the variables salary
and education
. The salary
variable is numeric, while the education
variable is categorical with values 'High School'
, 'Bachelor'
, and 'Master'
.education
variable to a factor using the factor()
function. This ensures that R treats the education variable as a categorical variable in the regression analysis.lm()
function to create a linear regression model with salary
as the dependent variable and education
as the independent variable. We assign the result to a variable named model
.summary()
function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of different education levels on salary.data <- data.frame(salary = c(40000, 50000, 60000, 70000, 80000), education = c('High School', 'Bachelor', 'Master', 'Bachelor', 'Master'))
data$education <- factor(data$education, levels = c('High School', 'Bachelor', 'Master'))
model <- lm(salary ~ education, data = data)
summary(model)
Call: lm(formula = salary ~ education, data = data) Residuals: 1 2 3 4 5 -20000 -5000 10000 -5000 20000 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 40000 10000.0 4.000 0.0577 . educationBachelor 10000 14142.1 0.707 0.5432 educationMaster 20000 14142.1 1.414 0.2910 Residual standard error: 15810 on 2 degrees of freedom Multiple R-squared: 0.75, Adjusted R-squared: 0.5 F-statistic: 3 on 2 and 2 DF, p-value: 0.3333
In this example,
data
that includes the variables performance
and department
. The performance
variable is numeric, while the department
variable is categorical with values 'HR'
, 'Finance'
, and 'IT'
.department
variable to a factor using the factor()
function. This ensures that R treats the department variable as a categorical variable in the regression analysis.lm()
function to create a linear regression model with performance
as the dependent variable and department
as the independent variable. We assign the result to a variable named model
.summary()
function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of different departments on performance.data <- data.frame(performance = c(75, 80, 85, 90, 95), department = c('HR', 'Finance', 'IT', 'Finance', 'IT'))
data$department <- factor(data$department, levels = c('HR', 'Finance', 'IT'))
model <- lm(performance ~ department, data = data)
summary(model)
Call: lm(formula = performance ~ department, data = data) Residuals: 1 2 3 4 5 -5.000 -2.500 2.500 -2.500 7.500 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 75.00 2.50 30.00 0.00110 ** departmentFinance 5.00 3.54 1.41 0.27838 departmentIT 10.00 3.54 2.82 0.10474 Residual standard error: 5 on 2 degrees of freedom Multiple R-squared: 0.8, Adjusted R-squared: 0.6 F-statistic: 4 on 2 and 2 DF, p-value: 0.2
In this tutorial, we learned How to Use Factors in Regression Analysis in R language with well detailed examples.