Description
Data Science has emerged as one of the most influential and interdisciplinary fields
in the modern digital era, playing a vital role in transforming raw data into
meaningful insights and informed decisions. With the rapid growth of data generated
from diverse sources such as social media, business transactions, sensors, and online
platforms, there is an increasing need for systematic methods to analyze, interpret,
and utilize data effectively. This book has been designed in alignment with the
University of Madras, B.Sc. Degree Programme in Computer Science syllabus (with
effect from 2023–2024) to provide a comprehensive and structured introduction to the
fundamental concepts, techniques, and methodologies of Data Science.
The primary objective of this textbook is to support both teaching and learning by
clearly addressing the prescribed learning objectives of the course. The content
enables teachers to guide students through essential data operations, introduce
simple statistical models, and explain the fundamentals of machine learning
techniques with a focus on regression. Emphasis is placed on promoting good data
science practices, developing practical skills using tools such as Python and
integrated development environments, and building a strong conceptual
understanding of supervised learning techniques through classroom and laboratory
activities.
This book is also structured to help students achieve the intended course outcomes.
It provides systematic guidance on cleaning and reshaping messy datasets, applying
exploratory data analysis techniques such as clustering and visualization, and
performing linear regression analysis. In addition, the text introduces widely used
classification methods including logistic regression, nearest neighbours, decision
trees, support vector machines, and neural networks. Dimensionality reduction
techniques, particularly principal component analysis, are also explained to help
students manage high-dimensional data effectively.




