Big Data using R
Learn the capabilities of using Big Data using R Enterprise for high performance analytics on datasets that exceed the normal physical memory limits of R. This bootcamp uses a combination of lecture and labs to instruct students on how to effectively use and script Big Data using R functions for big data analyses. In addition, you will learn how to visualize the results through the use of graphical packages.
This workshop is for analysts, product managers, mathematicians, business managers or anyone else that wants to learn how to code in Python.
In this workshop you’ll learn the end-to-end data science process:
- Collect data from a variety of sources (e.g., Excel, web-scraping, APIs and others)
- Explore large data sets
- Clean and “munge” the data to prepare it for analysis
- Apply machine learning algorithms to gain insight from the data
- Visualize the results of your analysis
This is a very practical and hands-on workshop that has lots of class exercises. You’ll build your own library of Python scripts that can be reused after your done with the course.
Prereqs & Preparation
You must bring a laptop with a text editor.
Sublime Text is recommended and has a free trial version (http://www.sublimetext.com/).
In addition, students should install Anaconda, which is a free package that includes python and a number of tools that will be used in class (http://continuum.io/downloads).
Session I: Intro to Python Fundamentals
- Introduction to Data Types
- If Statements
- For Loops
- Understanding lists, tuples, and dictionaries
Session II: Data Collection and Exploration
- Importing data from a variety of sources
- Data exploration
- Visualizing data using Matplotlib
Session III: Data Cleaning and Visualization
- Data manipulation using Pandas
- Data cleaning and formatting
- Feature engineering
Session IV: Machine Learning
- Overview of machine learning
- Implementing machine learning algorithms in Python
- Measuring algorithm performance with cross-validation