Data Management Systems

Lecturer: Gustavo Alonso

Teaching Assistants

  • Michal Friedman
  • Dan-Ovidiu Graur
  • Dario Korolija 
  • Dimitrios Koutsoukos
  • Michael Wawrzoniak

Lectures

  • Wednesday 10:00 - 12:00 CAB G61
  • Friday 8:00 - 9:00 HG G3
    Note: Lectures will be recored, but not live-streamed.
  • Lecture recordings

Exercises

  • Friday 9:00-10:00 HG D 5.1
  • Friday 9:00-10:00 HG G 26.1
  • Friday 9:00-10:00 HG G 26.5
    Note: Exercise sessions will begin on October 7th.

Contact

Please use the Moodle Q&A forum to ask questions outside of lectures and exercise sessions. If you have private questions for the instructors or TAs, please send an email to

Announcements and exercises will be handled through Moodle.

[ Course Moodle ]

Course contents

The course will cover the implementation aspects of data management systems using relational database engines as a starting point to cover the basic concepts of efficient data processing and then expanding those concepts to modern implementations in data centers and the cloud.

The goal of the course is to convey the fundamental aspects of efficient data management from a systems implementation perspective: storage, access, organization, indexing, consistency, concurrency, transactions, distribution, query compilation vs interpretation, data representations, etc. Using conventional relational engines as a starting point, the course will aim at providing an in depth coverage of the latest technologies used in data centers and the cloud to implement large scale data processing in various forms.

The course will first cover fundamental concepts in data management: storage, locality, query optimization, declarative interfaces, concurrency control and recovery, buffer managers, management of the memory hierarchy, presenting them in a system independent manner. The course will place an special emphasis on understating these basic principles as they are key to understanding what problems existing systems try to address. It will then proceed to explore their implementation in modern relational engines supporting SQL to then expand the range of systems used in the cloud: key value stores, geo-​replication, query as a service, serverless, large scale analytics engines, etc.

The main source of information for the course will be articles and research papers describing the architecture of the systems discussed. The list of papers will be provided as the materials for each chapter of the course are released.

The course will be recorded but not streamed online. The course will have no project or practical component. We will focus on the key architectural aspects and surveying the literature on data management systems architecture. The time that otherwise would have been devoted to programming will be invested instead in looking deeper at how systems are constructed and the algorithms behind many of the optimizations used in real systems. Freed from development work, students are expected to invest the necessary time reading the provided articles and books to gain the necessary understanding of the material.

Syllabus

Lecture schedule
 

Lecture slides are available via the course Moodle site.

 

Reading assignments
 

Teaching format

  • Lectures will be recorded (slides and voice).
  • Homework will be handled through Moodle.
  • Exam will be handled through Moodle.

     
JavaScript has been disabled in your browser