Principles of Data Science
Modul Number | 5107-411 |
Lecturer | Prof. Dr. Thomas Dimpfl |
Language | English |
Degree | First semester master in International Business and Economics |
Time and place | Lecture: Wednesday, 4 - 6 p.m. (room HSÖ1) Practical Class: Monday, 10 - 12 a.m. (room HS4) |
Exam | tba |
Credits | 6 ECTS |
Begin | Lecture: October 16th, 2024 Practical Class: October 14th, 2024 |
ILIAS Passwort | |
If you want to pursue the AI & Data Science Certificate Hohenheim (AIDAHO), the credits, that you can gain within this course, will be accepted for your certificate. However, in addition to being a member of the ILIAS course, you have to apply via the F.I.T. platform.
More information about the AIDAHO certificate can be found here.
Course Description
This course offers a comprehensive introduction to data science, combining conceptual understanding with hands-on experience. It is designed for students with a limited background in data science or programming, but requires basic statistics and math knowledge. It provides a practical foundation for further study and application in the field.
The course is structured around weekly lectures and accompanying practical sessions, where students gain experience using the R programming language for data analysis and visualization. The aim is to bridge theoretical concepts with practical implementation, ensuring students can not only understand core ideas but also apply them to real-world problems.
Key topics include three main pillars: data management, scientific computing, and data analysis.
In the data management component, students learn to identify and work with different types of data, including structured and unstructured formats, and explore various data sources. The course covers essential skills in data preprocessing, preparation, and cleaning—crucial steps to ensure data quality before analysis. Students also learn how to manage messy or incomplete datasets and how to handle issues like missing values or inconsistent formats.
The scientific computing part introduces the basic principles of programming in R, emphasizing clean and efficient code, good workflow practices, and the importance of reproducibility in scientific work. Students learn how to organize their projects in a reproducible manner and how to document their code and results effectively. We also introduce some basic Unix programming needed to work on remote clusters as well as version control with git.
In the data analysis segment, students are introduced to statistical thinking and model evaluation techniques. They explore several analytical methods, including k-nearest neighbors (KNN), polynomial regression and splines, and introductory textual analysis. A strong emphasis is placed on interpreting results correctly and communicating them clearly, both visually and in writing. Students also gain initial exposure to artificial intelligence topics, giving them a sense of how data science and machine learning intersect. Along the way, we also discuss the ethical implications, risks, and pitfalls of data science projects.
The learning objectives of the course are to ensure that students:
- Can design and conduct a complete data science project;
- Understand common challenges and pitfalls across the data science workflow—from data acquisition to final interpretation;
- Are equipped to carry out basic analyses and to choose appropriate methods, building a foundation for future, more advanced work.
The assessment consists of two written assignments and a computer-based exam. The assignments are designed to help students apply the full data science workflow in a project setting. In the past, for example, students conducted sentiment analysis using user reviews from the Steam gaming platform, learning how to collect and process text data, build models, and interpret their findings. The assignments encourage students to be creative and independent in framing questions, selecting data, and communicating insights.
By the end of the course, students will have a solid understanding of the data science process and be prepared to apply these skills in academic or applied settings, as well as in subsequent data-oriented courses.