CSC4780/6780 Fundamentals of data science

Administrative Info

Instructor: Berkay Aydin 

Email: baydin2@gsu.edu


Course Overview

Credit Hours: 4.0 hours

Class Policies: can be accessed from iCollege.

Pre-requisites: CS 2720 with a grade of “C” or higher

Textbook: Fundamentals of Machine Learning for Predictive Data Analytics. Algorithms, Worked Examples, and Case Studies by Kelleher et al. (ISBN: 978-0-26-202944-5)

Description: The goal of this course is to provide in detail, hands-on introduction of a few highly-popular but basic supervised machine learning algorithms and to teach students how to use them in practice. These techniques are only to be taught after multiple data preprocessing and visualization techniques are well understood. This course is intended as an introduction to more advanced data science-related courses, such as Data Mining, Machine Learning, Big Data Programming, etc. The primary goal of this course is to teach students the basics of data preprocessing and supervised machine learning, together with having students exposed to real-life problems via case analysis and data science-oriented programming.

Outcomes: This course provides a hands-on introduction to fundamental supervised machine learning algorithms. Upon successful completion, the student will be able to (1) Understand different data preprocessing strategies, and know how to deal with different types of data types and distributions, (2) Be able to visualize and explain the data trends, and insights that were generated through data processing as well as the basic supervised machine learning methodologies, (3) Choose basic machine learning methodologies based on a given supervised learning task, (4) Employ the chosen machine learning methodologies on real-life data, and perform comparative evaluations of these approaches while learning from these data.

Requirements: Students are expected to have at least basic programming skills in Python (see prerequisites) and an understanding of basic mathematical skills, specifically basic probability.

Grading

Grade CSc 4780 CSc 6780

A+ [100, ∞) [100, ∞) 

A [95, 100) [96, 100)

A- [90, 95) [92, 96) 

B+ [85, 90) [88, 92)

B [80, 85) [83, 88)

B- [75, 80) [79, 83)

C+ [70, 75) [75, 79)

C [65, 70) [70, 75)

C- [60, 65) [66, 70)

D [50, 60) [60, 66)

F [0, 50) [0, 60)

Course Outline

Week 1                Introduction to Fundamentals of Data Science + Python Tutorial                                      

Week 2                Machine Learning for Predictive Data Analytics 

Week 3                Data to Insights to Decisions 

Week 4                Data Exploration 

Week 5                Data Pre-processing 

Week 6                Data Presentation 

Week 7                Midterm Exam 

Week 8                Information-based Learning 

Week 9                Similarity-based Learning 

Week 10             Probability-based Learning 

Week 11             Error-based Learning 

Week 12             Feature Selection 

Week 13             Model Evaluation  

* The last two weeks of the term will include the presentation of the project and the final exam respectively