6 Weeks Data Science Applied Labs
Learn the skills you need to become a data scientist in our 16-week program led by a team of industry experts.
6 Week Project Based Learning
Project Based Learning is a dynamic approach solving real-world problems to gain knowledge and skills. Through this learning experience, you are able to investigate and respond to an engaging and complex question, problem, or challenge. Our goal into building Project Based Learning is to make you skillful to fulfill your career goal, from all angles data science requires: business, mathematics, & programming.
Whether you are familiar with programming or not, our Python PreWork sessions introduce the fundamentals of Python,
such as variables, string fundamentals, if-else statements, try & except statements, for loops, while loops, break
& continue statements, & lambda functions, as well as certain data types relevant to data science, like lists, tuples,
dictionaries, & sets for beginning exposure. The activities done in these sessions will be guides to student’s questions moving forward in the classes.
The hands-on portion of statistics in PreWork is to establish the surface level understanding of concepts
such as mathematical variables, like numerical vs categorical, nominal vs ordinal, interval vs discrete;
measurements of statistics, like when to use mean, when to consider median, & when to revised to mode;
relationship between variables, like correlation & independence; ending with hypothesis testing & p-value,
but only to the degree of applying the mindset towards data science. These concepts will be reviewed in
the program to ensure that student’s clarifications are addressed.
While some of the tools used in Python will take the place of SQL functions & methods, it is still beneficial
to understand the origins of these tools as well as be able to replicate them when applied in future work’s
expectations. A solid portion of demand in data science jobs ask for big-query experience with SQL, like
Microsoft SQL & PostgreSQL vs NoSQL, like MongoDB & DynamoDB, which we will glimpse at scenarios to further solidify the students’ candidacy.
An introduction to HTML & CSS is key to future project building & publications of the blog posts of student
progress throughout the program. A proportion of relevant data is out there in the web for us to utilize
& using the most open source methods, like HTML & CSS to be able to grab that information within our
Python environments will be introduced in Day 3 & furthermore, once students are in Project Based Learning,
GitHub portfolios are best displayed in themes that students choose & customize with HTML & CSS.
In our first class, we will go over some intermediate functions in Python as review & move onto introducing what is the
expected mindset of a data scientist versus the traditional viewpoint & how to take full advantage of the program by using
the Applied Labs environment. We will encourage students to introduce themselves to each other & gather each other’s
strengths, along with the instructor’s experience to not only grasp the skills & tools a data scientist is expected to know,
but know exactly when to use which tools & why through peer & real-life learning. There will also be an introduction to the
CRISP-DM data science methodology & chosen framework with the distinctions between the two mindsets of machine
learning: supervised learning & unsupervised learning. The session has two miniature projects, Temperature & Christmas,
to wrap up Python essentials
We start by asking the questions that data science can help answer for students to identify the difference between a data
analytical question vs a data science question. We further breakdown what are the key checklist items in form of
questions that CRISP-DM individual stages require before moving further in the cycle. We again showcase the
peculiarities between supervised learning & unsupervised learning & explain why sometimes supervised learning is the
method that most of us will encounter, but unsupervised learning will elaborate more patterns in data than we can ever
imagine. We introduce the self-checking mindset of what is considered good data for data science projects: what is good
data & how can we detect bad data from good data, & we let the students ponder how we can tackle dirty data. We then
give the attributes to help students identify big data from small data through the four V’s. A small review on what are the
differences between mathematical variables, numerical vs categorical along with a short case of where statistics are
required the most in data science: the data analysis phase. The hands-on portion of the class familiarizes students with
NumPy and Pandas and showcasing how to clean, manipulate, and analyze data by applying those concepts. Students
will be given the data set for Titanic, a Kaggle competition known for introductory data science methods & cleaning,
practicing data analysis skills on the Titanic dataset with Pandas to get students in the data science mindset of resultoriented,
instead of process-oriented.
We start off by asking what is the purpose of visualization in data science, broadening on student’s experiences with
charting & decision making with charts. A review of NumPy functions for generating different types of data is done before
a brief introduction to Matplotlib’s figure attributes & properties. Instructors will continue with explaining what are the most
common analysis-based visuals, such as histograms & scatterplots. An intermediate approach to Titanic is used for
exercises with graphing in Matplotlib & analyzing whether the graph is deemed useful or not. We continue with creating a
Python-based method for web-scraping & introduction to JSON. There are further functions & helpful tips to consider
analyzing data with Pandas, such as common Excel functions implemented to insights. The day ends with a project on
what happened during the 2012 election & whether the data of polls can give us clues into who was more likely to win. A
GitHub repository is expected to be created by the end of this session & students will learn how to create their own blog &
begin to publish content.
We will review by explaining the difference between supervised learning and unsupervised learning, asking students why
certain scenarios will not be effective for supervised learning. Furthermore, an explanation on the two result-oriented
methods of supervised learning, regression & classification are distinctly introduced. The day is dedicated to determining
a regression problem, immediate analysis to modeling using regression methods, assessing the models, then optimizing
for the best results by different metrics. Afterwards, students will work on building one of the regression models
introduced, such as linear, polynomial, ridge, lasso, gradient, robust, & an introduction to logistic regression for
classification. The day end with a Kaggle based project using regression.
Revisiting the results that students ended their House Pricing project with, we will give more hints & clues to how to
approach the project further. We will then dive into the second supervised learning need: classification algorithms, such as
Naïve Bayes, Decision Trees, Random Forest, and other methods based on regression. Students are expected to be able
to identify when a certain algorithm will be used based on the data & which methods to optimize classification algorithms
further to what is appropriate for insights & decisions. Students will also learn metrics such as R-squared, MSE & RMSE,
& scoring using precision, recall, sensitivity, specificity, and accuracy score, AUC, and ROC, along with gains & lift charts.
The session ends with a Spam Classifier project, which eludes to the processes of Natural Language Processing.
Students will be separated into two groups & able to truly practice their skills, emphasizing on visualization & modeling
with machine learning, with a live Kaggle competition. During this time working with others, students will also be
encouraged to identify the gaps in their skills, especially in analysis & modeling, in the project & review as much as
possible moving forward to other projects in the continual sessions.
Students will review machine learning algorithms and be introduced to types of recommender systems, like collaborative
filtering with k-nearest, using either items or users, like Amazon’s. Then students will start by building their own
recommender system with the MovieLens dataset, elaborating on what to consider as the best method for selection &
integrating with what viewers of recommender results will use best; understanding dimension reduction with PCA,
principle component analysis; explore SVM, support vector machines; and learn A/B Testing with T-Tests and P-Value
Students will explore the Natural Language Toolkit to process and extract text data: learning about tokenization of words
& sentences, part-of-speech tagging & stemming with lemmatization for the best analysis of textual data. Students will
then start a Natural Language Processing project with Yelp data before we move onto Sentimental Analysis to predict
positive versus negative Yelp reviews.
Students will be introduced to Big Data and data engineering with the Hadoop ecosystem, the MapReduce paradigm,
Apache Spark, and the up-and-coming Splunk, where real-time data is represented in a dashboard format for easier
assessment. An existing project, such as MovieLens, will be transferred to AWS to expose students to the difference.
Instructors will make sure that student’s understanding of unsupervised learning & supervised learning is reclarified &
where does deep learning come in. We will be introducing deep learning through TensorFlow and training neural network
and visualizing what a neural network has learned using TensorFlow Playground. Students will also learn time series,
what makes them special, loading and handling time series in Pandas. Students will understand how seasonality affects
trends. Projects for this session include handwriting recognition & digital face recognition.
After initial installation, we will expand on the notion why letting computers understand images is harder said then done
when compared to the way humans & eyes process images. Then, students will be introduced to computer vision
fundamentals using OpenCV to detect faces, people, cars, and other objects, even when images are manipulated in
rotations or scaling situations. Projects will use sensors such as student’s webcam to create a real-time facial recognition
program & object recognition program.
In the last session, we will host a private Kaggle competition amongst the students. Students will be grouped into teams
and will showcase their group project at the end of class. This will also be a career day, where we will assess students on
their presentation skills, as well as their business skills in terms of the project.
Students will apply the Cross Industry Standard Process for Data Mining (CRISP-DM) standard in a provided data set to understand the process behind starting a new project. We will recommend individual students to tackle different aspects of CRISP-DM that need more practice in, such as visualization, data understanding, or modeling.
Students will undertake a new project from start to finish. This project will allow students to demonstrate their skills in data acquisition, data cleaning, data enrichment, modeling, evaluation, and deployment.
As a third project, theDevMasters encourages students to bring their own data in their chosen domain for additional mentoring. Since these projects might entail more opinions & guidance, theDevMasters will proceed to transfer the third project over to the Mastermind group.