Apache Spark CourseApache Spark Course1

Module 12: Project – Real-World Data PipelineModule 12: Project – Real-World Data Pipeline1

Creating Temporary Views and Global Views in Spark SQL



Creating Temporary Views and Global Views in Spark SQL

Apache Spark SQL lets you run SQL queries on structured data using a familiar syntax. To make your data queryable using SQL, you can create views from DataFrames — either temporary views or global temporary views.

Let’s understand what these views are and how they differ with detailed examples using PySpark.

What is a Temporary View?

A temporary view is a session-scoped view created from a DataFrame. It is only accessible within the same SparkSession. Once the session ends, the view is dropped automatically.

Example: Creating and Using a Temporary View


from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.appName("TempViewExample").getOrCreate()

# Sample data
data = [("Alice", 25), ("Bob", 30), ("Cathy", 27)]
columns = ["name", "age"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Create a temporary view
df.createOrReplaceTempView("people")

# Run a SQL query on the temporary view
result = spark.sql("SELECT * FROM people WHERE age > 26")
result.show()
    
+-----+---+
| name|age|
+-----+---+
|  Bob| 30|
|Cathy| 27|
+-----+---+
    

Here, we created a DataFrame with names and ages, then created a people temporary view. We queried it using standard SQL inside the same session.

Question:

If we close the notebook or stop the Spark session, will the people view still be available?

Answer:

No. A temporary view only lasts as long as the current Spark session. It will be removed automatically when the session ends.

What is a Global Temporary View?

A global temporary view is tied to a system-wide global_temp database and is accessible across multiple SparkSessions within the same application.

This is useful when you want to share a view between different parts of your program or across notebooks in Databricks.

Example: Creating and Accessing a Global Temporary View


# Continuing with the same DataFrame
df.createOrReplaceGlobalTempView("global_people")

# Accessing the global view
global_result = spark.sql("SELECT * FROM global_temp.global_people WHERE age < 28")
global_result.show()
    
+-----+---+
| name|age|
+-----+---+
|Alice| 25|
|Cathy| 27|
+-----+---+
    

Note how we used global_temp.global_people to query the global view. This prefix is mandatory because global views are stored in a reserved database called global_temp.

Question:

Can we access global_people in a different SparkSession?

Answer:

Yes. Global views are accessible in any SparkSession using the global_temp namespace. This makes them more durable within the application.

Temporary vs Global Views

Feature Temporary View Global Temporary View
Scope Session-scoped Application-scoped
Namespace No prefix needed Requires global_temp. prefix
Lifetime Ends with the session Ends with the application
Use Case Single-session queries Sharing across sessions

Summary

Understanding when and how to use views will help you organize your data pipelines effectively and take full advantage of Spark SQL capabilities.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M