Spark — Save Dataset In Memory Outside Heap

Sandeep Khurana
2 min readSep 12, 2021

--

This article is for people who have some idea of Spark , Dataset / Dataframe. I am going to show how to persist a Dataframe off heap memory. Executors heap memory will not be used for the persist in this case. My example below is coded and executed from Scala spark-shell so might see corresponding settings. Btw, use persist or cache only when needed eg multiple actions over same DF/DS.

Enable Off Heap Storage

By default, off heap memory is disabled. You can enable this by setting below configurations

  • spark.memory.offHeap.size — Off heap size in bytes
  • spark.memory.offHeap.enabled — value must be true to enable off heap storage

Read more about these at — https://spark.apache.org/docs/latest/configuration.html#memory-management

You can enable these settings

  • In spark-shell use command
spark-shell --conf "spark.memory.offHeap.size=1000000000"  --conf "spark.memory.offHeap.enabled=true"
  • While using spark-submit also use same — conf flag

Sample Data

We will use below sample data to test. The data is stored in file sparkdata.txt

Read Data And Persist To Off Heap

  • Read
val data = spark.read.format("csv").option("header", "true").option("delimiter", ";").load("sparkdata.txt")
  • Persist
import org.apache.spark.storage._data.persist(StorageLevel.OFF_HEAP)
  • Show (or any other action)
data.show

Validate Dataframe Was Read From OffHeap For Action

Open Spark UI. Go to storage tab. Check Storage level. I see below. Btw, you can open Spark UI for spark-shell too. The UI shows both DS and RDD persisted under RDDs only.

Want to experiment more?

  • Unpersist the data — data.unpersist. Validate Spark UI -> Storage Tab. It will be blank as no data is persisted now.
data.unpersist
  • Now save same data using on heap storage level eg DISK_ONLY
data.persist(StorageLevel.DISK_ONLY)
  • Perform an action eg show
data.show
  • Check the Spark UI- Storage Tab -> Storage Level of the entry there. I see below. Check the difference between this storage level and the one we saw above for off heap

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response