rw-book-cover

Metadata

Highlights

First lets talk about cost and dismiss the incorrect assumption that Hadoop is cheaper: Hadoop can be 3x cheaper for data refinement, but to build a data warehouse in Hadoop it can be 3x more expensive due to the cost of writing complex queries and analysis (based on a WinterCorp report and my experiences). (View Highlight)

Data lakes offer a rich source of data for data scientists and self-service data consumers (“power users”) and serves analytics and big data needs well. But not all data and information workers want to become power users (View Highlight)

traditional relational data warehouse should be viewed as just one more data source available to a user on some very large federated data fabric. It is just pre-compiled to run certain queries very fast. And a data lake is another data source for the right type of people. A data lake should not be blocked from all users so you don’t have to tell everyone “please wait three weeks while I mistranslate your query request into a new measure and three new dimensions in the data warehouse”. (View Highlight)

But most business users get lost in that morass. So, someone has to model the data so it makes sense to business users (View Highlight)