Setup to run Jupyter Notebook with Pyspark kernel on your local machine

If you want to try out Pyspark without installing necessary libraries and spending too much time configuring your local environment, you can use below Docker image and get to work. It is simple and straight-forward. This will allow you to test some ideas quickly for free.

Using Jupyter Pyspark notebook image

You can access docs on Jupyter Docker Stacks here.

docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook:spark-3.1.1

And that's it!

You can also mount your working directory to the image; this will allow you resume your work when you are ready to use a remote spark cluster.

docker run --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work jupyter/pyspark-notebook:spark-3.1.1

Create a new notebook, and run below:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))

spark = SparkSession \
    .builder \
    .getOrCreate()

sc.getConf().getAll()

Cheers!