R Data Frames


R Data Frames

In this tutorial, we will learn about data frames in R. We will cover the basics of creating, accessing, modifying, and performing operations on data frames.


What is a Data Frame

A data frame in R is a table or two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. Data frames are used for storing data tables.


Creating Data Frames

Data frames can be created in R using the data.frame() function:

df <- data.frame(column1 = c(1, 2, 3), column2 = c('A', 'B', 'C'))

The above code creates a data frame with two columns, where column1 contains numeric values and column2 contains character values.



Creating a Simple Data Frame

  1. We start by creating a data frame named df using the data.frame function.
  2. The data frame has three columns: id with integer values, name with character values, and age with numeric values.
  3. We print the data frame to the console to see its structure.

R Program

df <- data.frame(id = c(1, 2, 3), name = c('Alice', 'Bob', 'Charlie'), age = c(25, 30, 35))
print(df)

Output

  id    name age
1  1   Alice  25
2  2     Bob  30
3  3 Charlie  35


Accessing Data Frame Elements

  1. We create a data frame named df with columns id, name, and age.
  2. We access the name column using the dollar sign $ operator and print it.
  3. We access the element in the first row and second column using the [row, column] notation and print it.

R Program

df <- data.frame(id = c(1, 2, 3), name = c('Alice', 'Bob', 'Charlie'), age = c(25, 30, 35))
print(df$name)
print(df[1, 2])

Output

[1] "Alice" "Bob" "Charlie"
[1] "Alice"


Modifying Data Frame Elements

  1. We create a data frame named df with columns id, name, and age.
  2. We modify the age of the second row by assigning a new value to it.
  3. We add a new column named gender to the data frame.
  4. We print the modified data frame to see the changes.

R Program

df <- data.frame(id = c(1, 2, 3), name = c('Alice', 'Bob', 'Charlie'), age = c(25, 30, 35))
df$age[2] <- 32
df$gender <- c('F', 'M', 'M')
print(df)

Output

  id    name age gender
1  1   Alice  25      F
2  2     Bob  32      M
3  3 Charlie  35      M


Filtering Data Frames

  1. We create a data frame named df with columns id, name, and age.
  2. We filter the data frame to include only rows where the age is greater than 30.
  3. We assign the filtered data frame to a new variable named df_filtered.
  4. We print the filtered data frame.

R Program

df <- data.frame(id = c(1, 2, 3), name = c('Alice', 'Bob', 'Charlie'), age = c(25, 30, 35))
df_filtered <- df[df$age > 30, ]
print(df_filtered)

Output

  id    name age
3  3 Charlie  35