CS286 Graduate Database Systems @ Berkeley (Spring 2024)

Course Description

The Database Systems field has been exploring issues in data storage, management, processing and analysis for over 50 years. In our data-rich 21st century, the lessons of the field apply increasingly broadly across Computer Science and beyond.

In this course we will study a range of papers from the distant and recent past. Given the breadth of the field, it is not possible to cover it comprehensively. Our focus will largely be on core database systems topics, and while the emphasis will be on systems work we will not shy away from theoretical results where relevant. The readings will mix classic papers with emerging results, and revisit topics that may be ripe for revival given current technology trends.

Course Format

This is a graduate seminar, and student participation is a key part of the experience. The course will be held in-person, and regular attendance is mandatory.

The bulk of the course will involve learning from research papers, both classics and recent results. Students will be expected to read papers in advance of class, and participate in class discussion. Students will also undertake independent research as part of the course.

History

The syllabus from the last offering in 2020 is here.

Syllabus

DateSectionTopicLecture NotesPaper 1Paper 2Additional ResourcesAdditional ResourcesAdditional Resources
1/16/24Historical ContextHistory: Codd, System R, IngresNo classArchitecture of a Database SystemA Relational Model of DataA History and Evaluation of System R RDBMS Geneaology
1/18/24IBM Ur-TextsLecture 1: On the remarkable foresight of System RGranularity of Locks and Degrees of ConsistencyAccess Path Selection in a Relational Database Management SystemThe Design and Implementation of Ingres
1/23/24Berkeley Evolved: PostgresLecture 2: On the remarkable breadth of PostgresThe POSTGRES Next-Generation Database SystemThe Design of the Postgres Storage SystemLooking Back at Postgres
1/25/24Wisconsin: GammaLecture 3: On the underappreciated foresight of GammaThe Gamma Database Machine ProjectParallel Database Systems: The Future of High Performance Database SystemsGAMMA - A High Performance Dataflow Database MachineThe Architecture of the Exodus Extensible DBMS
1/30/24TransactionsConcurrency ControlLecture 4: Many things you ever wanted to know about Concurrency ControlConcurrency Control in Distributed Database SystemsNotes on Distributed Databases, Lindsay et al.
2/1/24CC PerformanceLecture 5, in which the professor celebrates and dunks upon empirical computer scienceConcurrency control performance modeling: alternatives and implicationsA Tale of 1000 CoresThe Full Story of 1000 CoresTalk slides for The Tale
2/6Isolation LevelsLecture 6: Isolation Levels, the mess made by one Turing winner, cleaned up with help from anotherGeneralized Isolation Level DefinitionsSeeing is believing: A client-centric specification of database isolation
2/8/24Snapshot IsolationLecture 7: Sleight of hand from Oracle and how to improve itSerializable Snapshot Isolation in PostgreSQLSerializable Isolation for Snapshot DatabasesMaking Snapshot Isolation Serializable
2/13QP and OptimizationCascadesLecture 8: Extensible Query Optimization can solve your problems too ! (aka Cascades explained...finally)Extensible Query Optimizers in PracticeVolcano -- An Extensible and Parallel Query Evaluation SystemsThe Cascades Framework for Query OptimizationEvita Raced: Metacompilation for Declarative Networks
2/15/24Adaptive QPLecture 9: What if we could change our mind every tuple?Eddies: Continuously Adaptive Query ProcessingLifting the Burden of History from Adaptive Query ProcessingAn Adaptive Query Execution System for Data IntegrationAn Initial Study of Overheads of Eddies
2/20/24Lecture 10: Paying LIP Service to AdaptivityLooking Ahead Makes Query Plans RobustSQLite: Past, Present and FutureSimple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis
2/22/24Optimal (Multiway) JoinsLecture 11: "Join Ordering isn't a Thing. WCOJ is a Thing!" Or is it the other way around?Free Join: Unifying Worst-Case Optimal and Traditional JoinsShorter version for SIGMOD Record: From Binary Join to Free JoinLeapfrog Trie Join: A Worst-Case Optimal JoinWorst-case Optimal Join Algorithms
2/27/24Datalog and Recursive QueriesLecture 12: Datalog, Without the SlogDatalog & Recursive Query Processing, pp. 106-135Max Willsey's course notes
3/5/24Lecture 13: Dedalus is Datalog in Time and SpaceDedalus: Datalog in Time and SpaceModern Datalog Engines, Sec 3.2The Declarative Imperative
3/7Coordination AvoidanceCAP and CALMLecture 14: CAP and CALMCAP Twelve Years Later: How the "Rules" Have ChangedKeeping CALM: When Consistency is EasyCoordination Avoidance in Database Systems
3/12LatticesLecture 15: Quicksand and ACID 2.0Keeping CALM: When Consistency is EasyBuilding on Quicksand
3/14Monotonic Programming: CRDTs, LVars, Bloom^LLecture 16: Forward, Programmers! Monotonic language constructsCRDTsLogic and Lattices for Distributed ProgrammingLVars: Lattice-Based Data Structures for Deterministic Parallelismcrdt.tech
3/19More Algebraic QPData provenance and beyondLecture 17: Your Provenance is a Polynomial!Provenance Semirings
3/21Incremental maintenanceLecture 18: Rolling with the Changes, (Abelian) group-wise (DBSP)DBSP: Automatic Incremental View Maintenance...
3/26/24Spring BreakR & RNo classXXX
3/28/24R & RNo classXXX
4/2/24Main Memory DBMSIn-memory Analytic Query ProcessingLecture 19: The European Resurgence in MM-DBMSsMonetDB/X100: Hyper-Pipelining Query ExecutionEfficiently Compiling Efficient Query Plans for Modern HardwareEverything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to AskAndy Pavlo' s DB engines course at CMUIncremental Fusion: Unifying Compiled and Vectorized Query Execution
4/4/24In-Memory Transaction ProcessingLecture 20: Coordinate Infrequently for in-Memory OLTPSpeedy Transactions in Multicore In-Memory Databases
4/9/24Project PrepNo class
4/11/24Guest: Eugene WuProvenance for RealSmoke: Fine-grained Lineage at Interactive SpeedProvenance for Interactive VisualizationsCalibration: A Simple Trick for Wide-table Delta Analytics
4/16/24Cloud DBMSOLAPLecture 22: What's so funny bout peace, love and query processing in the cloud?POLARIS: The Distributed SQL Engine in Azure SynapseDremel: A Decade of Interactive SQL Analysis at Web ScaleAmazon Redshift Re-invented
4/18/24OLTPLecture 23: Transactions in the Cloud are Hard. Maybe single-master and ship the log.Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational DatabasesPolarDB-SCC: A Cloud-Native Database Ensuring Low Latency for Strongly Consistent ReadsOcean Vista: Gossip-Based Visibility Control for Speedy Geo-Distributed Transactions
4/23/24Guest: Aditya ParameswaranInteractive Data, Ordered and UnorderedLecture 24: Aditya Parameswaran on Spreadsheets and DataframesTowards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data ManagementTowards Scalable Dataframe SystemsFlexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe SystemBenchmarking Spreadsheet Systems
4/25/24Final LecturePredictive Interaction and other AI/DB/HCI FuturesLecture 25: An AI/DB/HCI Petri Dish and the things that are growingPredictive Interaction for Data transformationInteractive Visual Specification of Data Transformation ScriptsSelf-Service Data Preparation: Research to Practice