:this: A concept that's underlying the move to a datalake architecture (read: keeping your data in its rawest form, and its transforms in S3 or HDFS) is decoupling your compute from storage.
Motivating example: you have huge tables in Redshift that are either infrequently accessed or the usefulness of the data decays over time (website logs, customer order information). In this scenario you're paying a lot just to keep data in Redshift (storage) but a large subset of the data is laying dormant (no compute).
If you're bought into the Redshift ecosystem this is where Redshift Spectrum comes in. If you're a smaller company you could just store the data in S3 and "spin up" the compute when you need it (Athena, Glue jobs, or Elastic Map Reduce clusters).
For those of us not actually working at aws redshift gets insanely expensive when your data set grows into the terabytes. Analytics on s3 is much more cost effective using athena snowflake or old fashioned emr as yourdata grows
Motivating example: you have huge tables in Redshift that are either infrequently accessed or the usefulness of the data decays over time (website logs, customer order information). In this scenario you're paying a lot just to keep data in Redshift (storage) but a large subset of the data is laying dormant (no compute).
If you're bought into the Redshift ecosystem this is where Redshift Spectrum comes in. If you're a smaller company you could just store the data in S3 and "spin up" the compute when you need it (Athena, Glue jobs, or Elastic Map Reduce clusters).