KU-AICC202 Introduction to Data Science
Kathmandu University
Data Science
Course Overview
Data Science called as “the sexiest job of the 21st century” by Harvard Business Review., and in this period “the world’s most valuable resource is no longer oil, but data”. This course helps students to understand the basic concepts of Data Science like information extraction from the vast amounts of data using different scientific methods. It explains how the data is manipulated and processed to unravel useful underlying information from the raw and unstructured data. The course follows an example-based approach in terms of providing a better understanding on how data science techniques are applied in the real-world problems. Furthermore, the course provides insights to the ethical use of data, which has become very crucial with the rampant and abundant generation of data.
Programming Language used: Python, Julia, R
Course Objective
After completiton of the course, students should be able to:
- explain the basic concepts and techniques of Data Science.
- Ask appropriate questions, create appropriate data visualization and provide insight based on it.
- Apply Data Science techniques for descriptive, diagnositive, predictive & prescriptive analysis in real-world problems.
Prerequisites
It is expected that students have prior knowledge of mathematical preliminaries such as Probabilities and Statistics, Calculus and Linear Algebra. Besides, students should have the knowledge of high-level programming languages like C and C++ or Python to understand the concepts of Data Science and implement projects.
Chapter Breakdown
Chapter 1: Introduction to DataScience [6 Hrs.]
- Evolution of Science: Empirical, Theoretical, Computer Science and Data Science
- Current Trends in Data Science
- DIKW Pyramid
- Asking Good Questions
- Data Types and Properties
- Data Analytics Value Escalator
- Data Science Hierarchy
- Data Science Lifecycle
Chapter 2: Data Wrangling [9 Hrs.]
- Languages for Data Science
- Data Discovery
- Data Structuring
- Data Cleaning
- Data Transformation
- Data Sampling Techniques
- Data Enriching
- Feature Selection
- Data Validating
- Data Publishing
Chapter 3: Data to Models: Score, Rank, Class, Clusters [6 Hrs.]
- Why Model ?
- Philosophies of Modeling
- Data to Score:
- The Body Mass Index (BMI)
- Developing Scoring Systems
- Linear Regression
- Metrics
- Data to Rank
- Advanced Ranking Techniques: PageRank, Elo Ranking, Machine-learned ranking (MLR)
- Arrow’s Impossibility Theorem
- Metrics
- Data to Class
- Types of Classification
- Metrics
- Data to Clusters
- Types of Clustering
- Cluster Validation
Chapter 4: Visualizing Data [6 Hrs.]
- Exploratory (EDA) and Explantory Data Analysis
- Grammar of Graphics
- Developing a Visualization Aesthetic
- Principles of Visual Design
- Chart Types
- Interactive Visualization
- Common tools for Data Visualization
- Storytelling with data
- Misinformation and Disinformation
Chapter 5: Data Analysis [12 Hrs.]
Descriptive Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis
Case Studies
Optional
- Process Analysis
- Text Analysis
- Qualitative Research & Analysis
Chapter 6: Big Data [6 Hrs.]
- Introduction to Big Data
- Big Data Ecosystem
- Parallelism
- Hadoop and MapReduce
- Apache Spark
Chapter 7: Future Trends in Data Science [3 Hrs.]
- Current Operational trends
- Next-Generation Data Scientists
- Cultivating Soft Skills
- Being an Ethical Data Scientist
Reference
Steven S. Skiena. The Data Science Design Manual. Springer Cham, 2017
Cathy O’Neil, Rachel Schutt. Doing Data Science. O’REILLY Media, Inc., 2013
Joel Grus. Data Science from Scratch 2e: First Principles with Python. O’REILLY, 2019
Ivo D. Dinov. Data Science and Predictive Analytics. Springer, 2018