
## Metadata
- Author: [[imply|Imply]]
- Full Title:: Druid Architecture & Concepts
- Category:: #🗞️Articles
- URL:: https://imply.io/druid-architecture-concepts/
- Finished date:: [[2023-04-03]]
## Highlights
> In 2011, the data team at a technology company had a problem. They needed to quickly aggregate and query real-time data coming from website users across the Internet to analyze digital advertising auctions. This created large data sets, with millions or billions of rows.
> They first implemented their product using relational databases, starting with Greenplum, a fork of PostgreSQL. It worked, but needed many more machines to scale, and that was too expensive.
> They then used the NoSQL database HBase populated from Hadoop Mapreduce jobs. These jobs took hours to build the aggregations necessary for the product. At one point, adding only 3 dimensions on a data set that numbered in the low millions took the processing time from 9 hours to 24 hours..
> So, in the words of Eric Tschetter, one of Druid’s creators, “we did something crazy: we rolled our own database!” ([View Highlight](https://read.readwise.io/read/01gx2q2qkwgb7knyd7mq4gw3c6))
> Druid gets both performance and cost advantages by storing the segments on cloud storage and also pre-fetching them so they are ready when requested by the query engine. ([View Highlight](https://read.readwise.io/read/01gx2q3zrfnf3xp5m8zp9kqbft))
> 
> <img decoding="async" class="aligncenter size-full wp-image-9792" src="https://imply.io/wp-content/uploads/
[email protected]" alt="" width="1422" height="598" /> ([View Highlight](https://read.readwise.io/read/01gx2q8w5jzrqta5tgvck82m11))