How to Use Factors in Regression Analysis in R

How to Use Factors in Regression Analysis in R ?

Answer

To use factors in regression analysis in R, you need to convert the categorical variables into factors and include them in your regression model. This allows R to treat these variables correctly in the analysis, creating appropriate dummy variables for the regression equation.

✐ Examples

1 Using a Factor Representing Gender in Regression Analysis

In this example,

We start by creating a data frame named data that includes the variables income and gender. The income variable is numeric, while the gender variable is categorical with values 'Male' and 'Female'.
We convert the gender variable to a factor using the factor() function. This ensures that R treats the gender variable as a categorical variable in the regression analysis.
We use the lm() function to create a linear regression model with income as the dependent variable and gender as the independent variable. We assign the result to a variable named model.
We use the summary() function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of the gender variable on income.

R Program

data <- data.frame(income = c(50000, 60000, 55000, 65000, 70000), gender = c('Male', 'Female', 'Female', 'Male', 'Female'))
data$gender <- factor(data$gender)
model <- lm(income ~ gender, data = data)
summary(model)

Output

Call:
lm(formula = income ~ gender, data = data)

Residuals:
     1      2      3      4      5 
-5000  3000 -2500  4500   0 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   60000     2357.02  25.452  0.00155 **
genderFemale  -5000     3333.33  -1.500  0.24118  

Residual standard error: 3810 on 3 degrees of freedom
Multiple R-squared:  0.4286,    Adjusted R-squared:  0.2381 
F-statistic: 2.25 on 1 and 3 DF,  p-value: 0.2412

2 Using a Factor Representing Education Level in Regression Analysis

In this example,

We start by creating a data frame named data that includes the variables salary and education. The salary variable is numeric, while the education variable is categorical with values 'High School', 'Bachelor', and 'Master'.
We convert the education variable to a factor using the factor() function. This ensures that R treats the education variable as a categorical variable in the regression analysis.
We use the lm() function to create a linear regression model with salary as the dependent variable and education as the independent variable. We assign the result to a variable named model.
We use the summary() function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of different education levels on salary.

R Program

data <- data.frame(salary = c(40000, 50000, 60000, 70000, 80000), education = c('High School', 'Bachelor', 'Master', 'Bachelor', 'Master'))
data$education <- factor(data$education, levels = c('High School', 'Bachelor', 'Master'))
model <- lm(salary ~ education, data = data)
summary(model)

Output

Call:
lm(formula = salary ~ education, data = data)

Residuals:
     1      2      3      4      5 
-20000  -5000  10000  -5000  20000 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)   
(Intercept)           40000    10000.0   4.000   0.0577 . 
educationBachelor     10000    14142.1   0.707   0.5432   
educationMaster       20000    14142.1   1.414   0.2910   

Residual standard error: 15810 on 2 degrees of freedom
Multiple R-squared:  0.75,    Adjusted R-squared:  0.5 
F-statistic: 3 on 2 and 2 DF,  p-value: 0.3333

3 Using a Factor Representing Department in Regression Analysis

In this example,

We start by creating a data frame named data that includes the variables performance and department. The performance variable is numeric, while the department variable is categorical with values 'HR', 'Finance', and 'IT'.
We convert the department variable to a factor using the factor() function. This ensures that R treats the department variable as a categorical variable in the regression analysis.
We use the lm() function to create a linear regression model with performance as the dependent variable and department as the independent variable. We assign the result to a variable named model.
We use the summary() function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of different departments on performance.

R Program

data <- data.frame(performance = c(75, 80, 85, 90, 95), department = c('HR', 'Finance', 'IT', 'Finance', 'IT'))
data$department <- factor(data$department, levels = c('HR', 'Finance', 'IT'))
model <- lm(performance ~ department, data = data)
summary(model)

Output

Call:
lm(formula = performance ~ department, data = data)

Residuals:
     1      2      3      4      5 
 -5.000  -2.500  2.500 -2.500  7.500 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      75.00      2.50   30.00  0.00110 ** 
departmentFinance  5.00      3.54    1.41  0.27838    
departmentIT      10.00      3.54    2.82  0.10474    

Residual standard error: 5 on 2 degrees of freedom
Multiple R-squared:  0.8,    Adjusted R-squared:  0.6 
F-statistic: 4 on 2 and 2 DF,  p-value: 0.2

Summary

In this tutorial, we learned How to Use Factors in Regression Analysis in R language with well detailed examples.

◀ Previous Next ▶

R Tutorials

R How-Tos

How to Use Factors in Regression Analysis in R

How to Use Factors in Regression Analysis in R ?

Answer

✐ Examples

1 Using a Factor Representing Gender in Regression Analysis

R Program

Output

2 Using a Factor Representing Education Level in Regression Analysis

R Program

Output

3 Using a Factor Representing Department in Regression Analysis

R Program

Output

Summary

More R Factors Tutorials