Tuning Spark

Given the proven power and capability of Apache Spark for large-scale data processing, we use Spark on a regular basis here at ZGL. To write Spark code that will execute efficiently, it is extremely important to be aware of a set of tuning consideration and tricks. Unlike many other blog posts on Spark tuning, this post is intended to provide a concise checklist of such...

We’ve been taking a closer look at Azure for recent experiments at Zero Gravity Labs. While there are differences from the familiar AWS, I’m finding the Aure Data Lake REST API is a breeze to work with. In this blog post I’ll share a quick example in R.   Use Case We have a preprocessing job where the data set is relatively small (a few million ndjson records)....

