Big Data with Apache Spark

Whether you have experience in programming or are looking to get started for the first time, this workshop will put you on the fast track to honing your python and data analysis skills. In this bootcamp you’ll get hands-on programming experience in Python that you’ll be able to immediately apply in the real world. The workshop will cover the fundamentals of Python and several tools used in data science.

This workshop is for analysts, product managers, mathematicians, business managers or anyone else that wants to learn how to code in Python.


In this workshop you’ll learn the end-to-end data science process:

  • Collect data from a variety of sources (e.g., Excel, web-scraping, APIs and others)
  • Explore large data sets
  • Clean and “munge” the data to prepare it for analysis
  • Apply machine learning algorithms to gain insight from the data
  • Visualize the results of your analysis

This is a very practical and hands-on workshop that has lots of class exercises. You’ll build your own library of Python scripts that can be reused after your done with the course.

Prereqs & Preparation

You must bring a laptop with a text editor.

Sublime Text is recommended and has a free trial version (

In addition, students should install Anaconda, which is a free package that includes python and a number of tools that will be used in class (

Day 1

Section: 1 – Introduction To Spark

  • Lecture 1: Course Introduction

Section: 2 – Introduction and Application of Apache Spark

  • Lecture 2: Understanding Apache Spark
  • Lecture 3: Install Apache Spark on Cluster
  • Lecture 4: Apache Spark Scala API
  • Lecture 5: Running Apache Spark Code

Section: 3 – Apache Spark with YARN Cluster

  • Lecture 6: Apache Spark in Yarn Context
  • Lecture 7: Apache Spark with Yarn
  • Lecture 8: Yarn Clusters
  • Lecture 9: Bonus Video – Yarn on Eclipse

Day 2

Section: 5 – Apache Spark APIs

  • Lecture 10: Different types of Spark Applications
  • Lecture 11: Spark with Gradle
  • Lecture 12: Spark Applications – SQL Library

Section: 4 – Spark Applications

  • Lecture 13: Spark Streaming Applications
  • Lecture 14: Twitter Stream Application
  • Lecture 15: Lambda Architecture

Section: 6 – Course Summary

  • Lecture 16: Summary