Creating Temporary Views and Global Views in Spark SQL
Apache Spark SQL lets you run SQL queries on structured data using a familiar syntax. To make your data queryable using SQL, you can create views from DataFrames — either temporary views or global temporary views.
Let’s understand what these views are and how they differ with detailed examples using PySpark.
What is a Temporary View?
A temporary view is a session-scoped view created from a DataFrame. It is only accessible within the same SparkSession. Once the session ends, the view is dropped automatically.
Example: Creating and Using a Temporary View
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder.appName("TempViewExample").getOrCreate()
# Sample data
data = [("Alice", 25), ("Bob", 30), ("Cathy", 27)]
columns = ["name", "age"]
# Create DataFrame
df = spark.createDataFrame(data, columns)
# Create a temporary view
df.createOrReplaceTempView("people")
# Run a SQL query on the temporary view
result = spark.sql("SELECT * FROM people WHERE age > 26")
result.show()
+-----+---+ | name|age| +-----+---+ | Bob| 30| |Cathy| 27| +-----+---+
Here, we created a DataFrame with names and ages, then created a people
temporary view. We queried it using standard SQL inside the same session.
Question:
If we close the notebook or stop the Spark session, will the people
view still be available?
Answer:
No. A temporary view only lasts as long as the current Spark session. It will be removed automatically when the session ends.
What is a Global Temporary View?
A global temporary view is tied to a system-wide global_temp
database and is accessible across multiple SparkSessions within the same application.
This is useful when you want to share a view between different parts of your program or across notebooks in Databricks.
Example: Creating and Accessing a Global Temporary View
# Continuing with the same DataFrame
df.createOrReplaceGlobalTempView("global_people")
# Accessing the global view
global_result = spark.sql("SELECT * FROM global_temp.global_people WHERE age < 28")
global_result.show()
+-----+---+ | name|age| +-----+---+ |Alice| 25| |Cathy| 27| +-----+---+
Note how we used global_temp.global_people
to query the global view. This prefix is mandatory because global views are stored in a reserved database called global_temp
.
Question:
Can we access global_people
in a different SparkSession?
Answer:
Yes. Global views are accessible in any SparkSession using the global_temp
namespace. This makes them more durable within the application.
Temporary vs Global Views
Feature | Temporary View | Global Temporary View |
---|---|---|
Scope | Session-scoped | Application-scoped |
Namespace | No prefix needed | Requires global_temp. prefix |
Lifetime | Ends with the session | Ends with the application |
Use Case | Single-session queries | Sharing across sessions |
Summary
- Use temporary views for quick, session-local SQL access to DataFrames.
- Use global views to share views between different sessions in the same app.
- Temporary views are simpler; global views are more flexible across notebooks or Spark scripts.
Understanding when and how to use views will help you organize your data pipelines effectively and take full advantage of Spark SQL capabilities.