CSC4780/6780 Fundamentals of data science
Administrative Info
Instructor: Berkay Aydin
Email: baydin2@gsu.edu
Course Overview
Credit Hours: 4.0 hours
Class Policies: can be accessed from iCollege.
Pre-requisites: CS 2720 with a grade of “C” or higher
Textbook: Fundamentals of Machine Learning for Predictive Data Analytics. Algorithms, Worked Examples, and Case Studies by Kelleher et al. (ISBN: 978-0-26-202944-5)
Description: The goal of this course is to provide in detail, hands-on introduction of a few highly-popular but basic supervised machine learning algorithms and to teach students how to use them in practice. These techniques are only to be taught after multiple data preprocessing and visualization techniques are well understood. This course is intended as an introduction to more advanced data science-related courses, such as Data Mining, Machine Learning, Big Data Programming, etc. The primary goal of this course is to teach students the basics of data preprocessing and supervised machine learning, together with having students exposed to real-life problems via case analysis and data science-oriented programming.
Outcomes: This course provides a hands-on introduction to fundamental supervised machine learning algorithms. Upon successful completion, the student will be able to (1) Understand different data preprocessing strategies, and know how to deal with different types of data types and distributions, (2) Be able to visualize and explain the data trends, and insights that were generated through data processing as well as the basic supervised machine learning methodologies, (3) Choose basic machine learning methodologies based on a given supervised learning task, (4) Employ the chosen machine learning methodologies on real-life data, and perform comparative evaluations of these approaches while learning from these data.
Requirements: Students are expected to have at least basic programming skills in Python (see prerequisites) and an understanding of basic mathematical skills, specifically basic probability.
Grading
Grade CSc 4780 CSc 6780
A+ [100, ∞) [100, ∞)
A [95, 100) [96, 100)
A- [90, 95) [92, 96)
B+ [85, 90) [88, 92)
B [80, 85) [83, 88)
B- [75, 80) [79, 83)
C+ [70, 75) [75, 79)
C [65, 70) [70, 75)
C- [60, 65) [66, 70)
D [50, 60) [60, 66)
F [0, 50) [0, 60)
Course Outline
Week 1 Introduction to Fundamentals of Data Science + Python Tutorial
Week 2 Machine Learning for Predictive Data Analytics
Week 3 Data to Insights to Decisions
Week 4 Data Exploration
Week 5 Data Pre-processing
Week 6 Data Presentation
Week 7 Midterm Exam
Week 8 Information-based Learning
Week 9 Similarity-based Learning
Week 10 Probability-based Learning
Week 11 Error-based Learning
Week 12 Feature Selection
Week 13 Model Evaluation
* The last two weeks of the term will include the presentation of the project and the final exam respectively