CSC4780/6780 Fundamentals of data science

Administrative Info

Instructor: Berkay Aydin

Email: baydin2@gsu.edu


Course Overview

Credit Hours: 4.0 hours

Class Policies: can be accessed here. (https://grid.cs.gsu.edu/~baydin2/courses/policy.html)

Pre-requisites: CS 2720 with a grade of “C” or higher

Textbook: Fundamentals of Machine Learning for Predictive Data Analytics. Algorithms, Worked Examples, and Case Studies by Kelleher et al. (ISBN: 978-0-26-202944-5)

Description: The goal of this course is to provide in detail, hands-on introduction of a few highly-popular but basic supervised machine learning algorithms and to teach students how to use them in practice. These techniques are only to be taught after multiple data preprocessing and visualization techniques are well understood. This course is intended as an introduction to more advanced data science-related courses, such as Data Mining, Machine Learning, Big Data Programming, etc. The primary goal of this course is to teach students the basics of data preprocessing and supervised machine learning, together with having students exposed to real-life problems via case analysis and data science-oriented programming.

Outcomes: This course provides a hands-on introduction to fundamental supervised machine learning algorithms. Upon successful completion, the student will be able to (1) Understand different data preprocessing strategies, and know how to deal with different types of data types and distributions, (2) Be able to visualize and explain the data trends, and insights that were generated through data processing as well as the basic supervised machine learning methodologies, (3) Choose basic machine learning methodologies based on a given supervised learning task, (4) Employ the chosen machine learning methodologies on real-life data, and perform comparative evaluations of these approaches while learning from these data.

Requirements: Students are expected to have at least basic programming skills in Python (see prerequisites) and an understanding of basic mathematical skills, specifically basic probability.

Grading

Grade CSc 4780 CSc 6780

A+ [100, ∞) [100, ∞)

A [95, 100) [96, 100)

A- [90, 95) [92, 96)

B+ [85, 90) [88, 92)

B [80, 85) [83, 88)

B- [75, 80) [79, 83)

C+ [70, 75) [75, 79)

C [65, 70) [70, 75)

C- [60, 65) [66, 70)

D [50, 60) [60, 66)

F [0, 50) [0, 60)

Course Outline

Week 1 Introduction to Fundamentals of Data Science + Python Tutorial

Week 2 Machine Learning for Predictive Data Analytics

Week 3 Data to Insights to Decisions

Week 4 Data Exploration

Week 5 Data Pre-processing

Week 6 Data Presentation

Week 7 Midterm Exam

Week 8 Information-based Learning

Week 9 Similarity-based Learning

Week 10 Probability-based Learning

Week 11 Error-based Learning

Week 12 Feature Selection

Week 13 Model Evaluation

* The last two weeks of the term will include the presentation of the project and the final exam respectively