# Mastering Applied Data Science

**In-person Training****Innovative Way to Learn****PreWork, One-on-One****100% Hands-On****Mentoring Service, One-on-One****Project Based Learning****PreWork, One-on-One****Job Assistance****Lifetime Membership, Repeat Project or Session Anytime****Mastermind Groups****Kaggle Challenges****Competitively Priced****PreWork, One-on-One****Financing Options**

**6 Weeks Data Science Applied Labs**

**Learn the skills you need** to become a data scientist in our 16-week program led by a team of industry experts.

- Build Python skills for programming
- Data mine data sets and data cleaning techniques for analysis of your data set
- Master data visualization techniques
- Build and implement predictive models and algorithms using
**machine learning** - Learn models like linear regression, logistics regression, classification models, K-nearest neighbors, and random forest
- Experience with specializations like Natural Language Processing, Computer Vision, Big Data, Recommender Systems, Deep Learning

**6 Week Project Based Learning**

**Project Based Learning** is a dynamic approach solving real-world problems to gain knowledge and skills. Through this learning experience, you are able to investigate and respond to an engaging and complex question, problem, or challenge. Our goal into building Project Based Learning is to make you skillful to fulfill your career goal, from all angles data science requires: business, mathematics, & programming.

- Hands-on learning and real life projects
- Kaggle Competitions
- With 100% in-person instruction from the experts in data science
- Well-networked in our data science community
- Build personal GitHub showcase as portfolio

#### Mastering Data Science Applied Labs

# PreWork

#### Session I : Python 101

Whether you are familiar with programming or not, our Python PreWork sessions introduce the fundamentals of Python,

such as variables, string fundamentals, if-else statements, try & except statements, for loops, while loops, break

& continue statements, & lambda functions, as well as certain data types relevant to data science, like lists, tuples,

dictionaries, & sets for beginning exposure. The activities done in these sessions will be guides to student’s questions moving forward in the classes.

#### Session II : Statistics 101

The hands-on portion of statistics in PreWork is to establish the surface level understanding of concepts

such as mathematical variables, like numerical vs categorical, nominal vs ordinal, interval vs discrete;

measurements of statistics, like when to use mean, when to consider median, & when to revised to mode;

relationship between variables, like correlation & independence; ending with hypothesis testing & p-value,

but only to the degree of applying the mindset towards data science. These concepts will be reviewed in

the program to ensure that student’s clarifications are addressed.

#### Session III : SQL 101

While some of the tools used in Python will take the place of SQL functions & methods, it is still beneficial

to understand the origins of these tools as well as be able to replicate them when applied in future work’s

expectations. A solid portion of demand in data science jobs ask for big-query experience with SQL, like

Microsoft SQL & PostgreSQL vs NoSQL, like MongoDB & DynamoDB, which we will glimpse at scenarios to further solidify the students’ candidacy.

#### Session IV : Web 101

An introduction to HTML & CSS is key to future project building & publications of the blog posts of student

progress throughout the program. A proportion of relevant data is out there in the web for us to utilize

& using the most open source methods, like HTML & CSS to be able to grab that information within our

Python environments will be introduced in Day 3 & furthermore, once students are in Project Based Learning,

GitHub portfolios are best displayed in themes that students choose & customize with HTML & CSS.

# Applied Labs

#### Session I : Introduction to Data Science with Python

In our first class, we will go over some intermediate functions in Python as review & move onto introducing what is the

expected mindset of a data scientist versus the traditional viewpoint & how to take full advantage of the program by using

the Applied Labs environment. We will encourage students to introduce themselves to each other & gather each other’s

strengths, along with the instructor’s experience to not only grasp the skills & tools a data scientist is expected to know,

but know exactly when to use which tools & why through peer & real-life learning. There will also be an introduction to the

CRISP-DM data science methodology & chosen framework with the distinctions between the two mindsets of machine

learning: supervised learning & unsupervised learning. The session has two miniature projects, Temperature & Christmas,

to wrap up Python essentials

##### Project 1: How Much Longer Until Christmas?

#### Session II : Exploratory Data Analysis

We start by asking the questions that data science can help answer for students to identify the difference between a data

analytical question vs a data science question. We further breakdown what are the key checklist items in form of

questions that CRISP-DM individual stages require before moving further in the cycle. We again showcase the

peculiarities between supervised learning & unsupervised learning & explain why sometimes supervised learning is the

method that most of us will encounter, but unsupervised learning will elaborate more patterns in data than we can ever

imagine. We introduce the self-checking mindset of what is considered good data for data science projects: what is good

data & how can we detect bad data from good data, & we let the students ponder how we can tackle dirty data. We then

give the attributes to help students identify big data from small data through the four V’s. A small review on what are the

differences between mathematical variables, numerical vs categorical along with a short case of where statistics are

required the most in data science: the data analysis phase. The hands-on portion of the class familiarizes students with

NumPy and Pandas and showcasing how to clean, manipulate, and analyze data by applying those concepts. Students

will be given the data set for Titanic, a Kaggle competition known for introductory data science methods & cleaning,

practicing data analysis skills on the Titanic dataset with Pandas to get students in the data science mindset of resultoriented,

instead of process-oriented.

##### Project 2: Exploration of Titanic

#### Session III : Data Visualization & Information Analysis

We start off by asking what is the purpose of visualization in data science, broadening on student’s experiences with

charting & decision making with charts. A review of NumPy functions for generating different types of data is done before

a brief introduction to Matplotlib’s figure attributes & properties. Instructors will continue with explaining what are the most

common analysis-based visuals, such as histograms & scatterplots. An intermediate approach to Titanic is used for

exercises with graphing in Matplotlib & analyzing whether the graph is deemed useful or not. We continue with creating a

Python-based method for web-scraping & introduction to JSON. There are further functions & helpful tips to consider

analyzing data with Pandas, such as common Excel functions implemented to insights. The day ends with a project on

what happened during the 2012 election & whether the data of polls can give us clues into who was more likely to win. A

GitHub repository is expected to be created by the end of this session & students will learn how to create their own blog &

begin to publish content.

##### Project 3: Election Day Results

#### Session IV : Machine Learning

We will review by explaining the difference between supervised learning and unsupervised learning, asking students why

certain scenarios will not be effective for supervised learning. Furthermore, an explanation on the two result-oriented

methods of supervised learning, regression & classification are distinctly introduced. The day is dedicated to determining

a regression problem, immediate analysis to modeling using regression methods, assessing the models, then optimizing

for the best results by different metrics. Afterwards, students will work on building one of the regression models

introduced, such as linear, polynomial, ridge, lasso, gradient, robust, & an introduction to logistic regression for

classification. The day end with a Kaggle based project using regression.

##### Project 4: Optimizing House Price Prediction

#### Session V : Advanced Machine Learning

Revisiting the results that students ended their House Pricing project with, we will give more hints & clues to how to

approach the project further. We will then dive into the second supervised learning need: classification algorithms, such as

Naïve Bayes, Decision Trees, Random Forest, and other methods based on regression. Students are expected to be able

to identify when a certain algorithm will be used based on the data & which methods to optimize classification algorithms

further to what is appropriate for insights & decisions. Students will also learn metrics such as R-squared, MSE & RMSE,

& scoring using precision, recall, sensitivity, specificity, and accuracy score, AUC, and ROC, along with gains & lift charts.

The session ends with a Spam Classifier project, which eludes to the processes of Natural Language Processing.

##### Project 5: Classification of Spam Emails

#### Session VI : Hack Day

Students will be separated into two groups & able to truly practice their skills, emphasizing on visualization & modeling

with machine learning, with a live Kaggle competition. During this time working with others, students will also be

encouraged to identify the gaps in their skills, especially in analysis & modeling, in the project & review as much as

possible moving forward to other projects in the continual sessions.

##### Project 6: Baseline Kaggle Competition

#### Session VII : Recommender Systems

Students will review machine learning algorithms and be introduced to types of recommender systems, like collaborative

filtering with k-nearest, using either items or users, like Amazon’s. Then students will start by building their own

recommender system with the MovieLens dataset, elaborating on what to consider as the best method for selection &

integrating with what viewers of recommender results will use best; understanding dimension reduction with PCA,

principle component analysis; explore SVM, support vector machines; and learn A/B Testing with T-Tests and P-Value

methods.

##### Project 7: MovieLens Through Recommendations

#### Session VIII : Natural Language Processing & Sentiment Analysis

Students will explore the Natural Language Toolkit to process and extract text data: learning about tokenization of words

& sentences, part-of-speech tagging & stemming with lemmatization for the best analysis of textual data. Students will

then start a Natural Language Processing project with Yelp data before we move onto Sentimental Analysis to predict

positive versus negative Yelp reviews.

##### Project 8: Yelp Reviews & the Truth from Customers

#### Session IX : Big Data with Spark & Splunk

Students will be introduced to Big Data and data engineering with the Hadoop ecosystem, the MapReduce paradigm,

Apache Spark, and the up-and-coming Splunk, where real-time data is represented in a dashboard format for easier

assessment. An existing project, such as MovieLens, will be transferred to AWS to expose students to the difference.

##### Project 9: MovieLens Through Big Data & Splunk

#### Session X : Deep Learning and Time Series

Instructors will make sure that student’s understanding of unsupervised learning & supervised learning is reclarified &

where does deep learning come in. We will be introducing deep learning through TensorFlow and training neural network

and visualizing what a neural network has learned using TensorFlow Playground. Students will also learn time series,

what makes them special, loading and handling time series in Pandas. Students will understand how seasonality affects

trends. Projects for this session include handwriting recognition & digital face recognition.

##### Project 10: Hand-writing Recognition

#### Session XI : Computer Vision with OpenCV and Hack Project

After initial installation, we will expand on the notion why letting computers understand images is harder said then done

when compared to the way humans & eyes process images. Then, students will be introduced to computer vision

fundamentals using OpenCV to detect faces, people, cars, and other objects, even when images are manipulated in

rotations or scaling situations. Projects will use sensors such as student’s webcam to create a real-time facial recognition

program & object recognition program.

##### Project 11: Facial Recognition

#### Session XII : Hack Day

In the last session, we will host a private Kaggle competition amongst the students. Students will be grouped into teams

and will showcase their group project at the end of class. This will also be a career day, where we will assess students on

their presentation skills, as well as their business skills in terms of the project.

##### Project 12: Private Kaggle Competition

#### Project Based Learning

**Project I**

#### Skill Assessment

Students will apply the Cross Industry Standard Process for Data Mining (CRISP-DM) standard in a provided data set to understand the process behind starting a new project. We will recommend individual students to tackle different aspects of CRISP-DM that need more practice in, such as visualization, data understanding, or modeling.

**Project II**

#### End-to-End Development

Students will undertake a new project from start to finish. This project will allow students to demonstrate their skills in data acquisition, data cleaning, data enrichment, modeling, evaluation, and deployment.

**Project III**

#### Your Own Data

As a third project, theDevMasters encourages students to bring their own data in their chosen domain for additional mentoring. Since these projects might entail more opinions & guidance, theDevMasters will proceed to transfer the third project over to the Mastermind group.