Cloud Data Warehousing

Reminder: Key aspects of DW workloads:

Analytics/Data Warehousing conventional wisdom today:

We know about Gamma. We know about MapReduce. What changes in the Cloud?

  1. Delegate components to good-enough subsystems, and their dev and ops teams
    1. Most inexpensive storage is shared object storage (S3). High latency, high bandwidth. Good for DW!
      1. Also handles encryption/auth in a unified setting
    2. Distributed resource allocation/management/membership handled by K8s and co.
  2. Elasticity is available: what shall we use it for and how?
  3. Need to address global reach and geolatencies.
  4. Lots of shared work across queries/users
  5. Diversity of HW
  6. SLOs. E.g. “best effort” vs “reserved”

What technical challenges arise?

What are some Design Space considerations?