KU-AICC202-Intro-to-Data-Science
Kathmandu University Department of Computer Science
Subject: Introduction to Data Science
Course Code: AICC 202
Level: BTech in AI 2nd year 1st semester
Credit Hours: 3
Type: Core [Theory + Practical]
Course Description
Data Science called as “the sexiest job of the 21st century” by Harvard Business Review., and in this period “the world’s most valuable resource is no longer oil, but data”. This course helps students to understand the basic concepts of Data Science like information extraction from the vast amounts of data using different scientific methods. It explains how the data is manipulated and processed to unravel useful underlying information from the raw and unstructured data. The course follows an example-based approach in terms of providing a better understanding on how data science techniques are applied in the real-world problems. Furthermore, the course provides insights to the ethical use of data, which has become very crucial with the rampant and abundant generation of data.
Programming Language used: Python, Julia, R
Course Objective
- To introduce the basic concepts and techniques of Data Science.
- To give an insight to data visualization
- To understand the current scenario and application of Data Science.
- To understand how Data Science techniques are used in real-world problems.
Prerequisites
It is expected that students have prior knowledge of mathematical preliminaries such as Probabilities and Statistics, Calculus and Linear Algebra. Besides, students should have the knowledge of high-level programming languages like C and C++ or Python to understand the concepts of Data Science and implement projects.
Course Evaluation
Internal Examination: 50%
Final Examination: 50%
Chapters
- Evolution of Science: Empirical, Theoretical, Computer Science and Data Science
- Current Trends in Data Science
- DIKW Pyramid
- Asking Good Questions
- Data Types and Properties
- Data Analytics Value Escalator
- Data Science Hierarchy
- Data Science Lifecycle
Practical:
- Languages for Data Science
- Data Discovery
- Data Structuring
- Data Cleaning
- Data Transformation
- Data Sampling Techniques
- Data Enriching
- Feature Selection
- Data Validating
- Data Publishing
- Why Model ?
- Data to Score
- The Body Mass Index (BMI)
- Developing Scoring Systems
- Linear Regression
- Metrics
- Data to Rank
- Advanced Ranking Techniques: PageRank, Elo Ranking, Machine-learned ranking (MLR)
- Arrow’s Impossibility Theorem
- Metrics
- Data to Class
- Types of Classification
- Metrics
- Data to Clusters
- Types of Clustering
- Cluster Validation
- Exploratory (EDA) and Explantory Data Analysis
- Grammar of Graphics
- Developing a Visualization Aesthetic
- Chart Types
- Data Visualization Perspectives
- Principles of Visual Design
- Interactive Visualization
- Common tools for Data Visualization
- Storytelling with data
- Misinformation and Disinformation
- Descriptive Analysis
- Diagnostic Analysis
- Predictive Analysis
- Prescriptive Analysis
- Case Studies
Optional
- Process Analysis
- Text Analysis
- Qualitative Research & Analysis
- Introduction to Big Data
- Big Data Ecosystem
- Parallelism
- Hadoop and MapReduce
- Apache Spark
- Current Operational trends
- Next-Generation Data Scientists
- Cultivating Soft Skills
- Being an Ethical Data Scientist
Text Books
- Steven S. Skiena. The Data Science Design Manual. Springer Cham, 2017
- Cathy O’Neil, Rachel Schutt. Doing Data Science. O’REILLY Media, Inc., 2013
Reference Books:
- Joel Grus. Data Science from Scratch 2e: First Principles with Python. O’REILLY, 2019
- Ivo D. Dinov. Data Science and Predictive Analytics. Springer, 2018