⬅ Previous Topic
Use Cases of Apache Spark in IndustryNext Topic ⮕
Installing Apache Spark on a Local Linux System⬅ Previous Topic
Use Cases of Apache Spark in IndustryNext Topic ⮕
Installing Apache Spark on a Local Linux SystemThis guide walks you through setting up Apache Spark on a Windows machine from scratch. We’ll install the required components step by step: Java, Hadoop (Winutils), Spark, and configure PySpark for Python usage.
Apache Spark needs Java to run. We recommend using Java 8 or Java 11.
C:\Program Files\Java\jdk-11.0.x
JAVA_HOME
= Path to your Java installation%JAVA_HOME%\bin
to the Path
variableWhy does Spark need Java?
Because Spark is written in Scala, which runs on the JVM. Java ensures the necessary runtime environment is available.
Spark uses Hadoop's file system APIs. On Windows, we need a helper binary called winutils.exe
.
winutils.exe
from a trusted source or GitHub mirror.C:\hadoop\bin
and place winutils.exe
inside it.HADOOP_HOME
= C:\hadoop%HADOOP_HOME%\bin
to the Path
Why do we need winutils.exe?
Spark expects Hadoop-like commands (like managing file permissions), and winutils.exe
provides this compatibility on Windows.
C:\spark
SPARK_HOME
= C:\spark%SPARK_HOME%\bin
to the Path
Open Command Prompt and run:
spark-shell
This should launch the interactive Spark shell in Scala.
Do I need to learn Scala to use Spark?
No. As a beginner, you can use PySpark, which allows you to use Python with Spark.
pip install pyspark
To verify everything is working:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder .appName("LocalSparkTest") .getOrCreate()
df = spark.createDataFrame([
("Alice", 25),
("Bob", 30),
("Charlie", 35)
], ["Name", "Age"])
df.show()
+-------+---+ | Name|Age| +-------+---+ | Alice| 25| | Bob| 30| |Charlie| 35| +-------+---+
This confirms Spark is installed correctly and working with Python (PySpark).
HADOOP_HOME
SPARK_HOME
Once installed, you're ready to explore Spark’s powerful data processing features using Python.
⬅ Previous Topic
Use Cases of Apache Spark in IndustryNext Topic ⮕
Installing Apache Spark on a Local Linux SystemYou can support this website with a contribution of your choice.
When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.