Principles of Data Science

Modul Number5107-411
LecturerProf. Dr. Thomas Dimpfl
LanguageEnglish
DegreeFirst semester master in International Business and Economics
Time and place

Lecture: Wednesday, 4 - 6 p.m. (room HSÖ1)

Practical Class: Monday, 10 - 12 a.m. (room HS4)

Examtba
Credits6 ECTS
Begin

Lecture: October 16th, 2024

Practical Class: October 14th, 2024

ILIAS Passwort

If you want to pursue the AI & Data Science Certificate Hohenheim (AIDAHO), the credits, that you can gain within this course, will be accepted for your certificate. However, in addition to being a member of the ILIAS course, you have to apply via the F.I.T. platform.

More information about the AIDAHO certificate can be found here.

Course Description

This course offers a comprehensive introduction to data science, combining conceptual understanding with hands-on experience. It is designed for students with a limited background in data science or programming, but requires basic statistics and math knowledge. It provides a practical foundation for further study and application in the field.

The course is structured around weekly lectures and accompanying practical sessions, where students gain experience using the R programming language for data analysis and visualization. The aim is to bridge theoretical concepts with practical implementation, ensuring students can not only understand core ideas but also apply them to real-world problems.

Key topics include three main pillars: data management, scientific computing, and data analysis.

In the data management component, students learn to identify and work with different types of data, including structured and unstructured formats, and explore various data sources. The course covers essential skills in data preprocessing, preparation, and cleaning—crucial steps to ensure data quality before analysis. Students also learn how to manage messy or incomplete datasets and how to handle issues like missing values or inconsistent formats.

The scientific computing part introduces the basic principles of programming in R, emphasizing clean and efficient code, good workflow practices, and the importance of reproducibility in scientific work. Students learn how to organize their projects in a reproducible manner and how to document their code and results effectively. We also introduce some basic Unix programming needed to work on remote clusters as well as version control with git.

In the data analysis segment, students are introduced to statistical thinking and model evaluation techniques. They explore several analytical methods, including k-nearest neighbors (KNN), polynomial regression and splines, and introductory textual analysis. A strong emphasis is placed on interpreting results correctly and communicating them clearly, both visually and in writing. Students also gain initial exposure to artificial intelligence topics, giving them a sense of how data science and machine learning intersect. Along the way, we also discuss the ethical implications, risks, and pitfalls of data science projects. 

The learning objectives of the course are to ensure that students:
- Can design and conduct a complete data science project;
- Understand common challenges and pitfalls across the data science workflow—from data acquisition to final interpretation;
- Are equipped to carry out basic analyses and to choose appropriate methods, building a foundation for future, more advanced work.

The assessment consists of two written assignments and a computer-based exam. The assignments are designed to help students apply the full data science workflow in a project setting. In the past, for example, students conducted sentiment analysis using user reviews from the Steam gaming platform, learning how to collect and process text data, build models, and interpret their findings. The assignments encourage students to be creative and independent in framing questions, selecting data, and communicating insights.

By the end of the course, students will have a solid understanding of the data science process and be prepared to apply these skills in academic or applied settings, as well as in subsequent data-oriented courses.