Druid architecture concepts - Dr. Mario's 🧠

![rw-book-cover](https://imply.io/wp-content/uploads/druid-architecture-blog-featured-2.jpg) ## Metadata - Author: [[imply|Imply]] - Full Title:: Druid Architecture & Concepts - Category:: #🗞️Articles - URL:: https://imply.io/druid-architecture-concepts/ - Finished date:: [[2023-04-03]] ## Highlights > In 2011, the data team at a technology company had a problem. They needed to quickly aggregate and query real-time data coming from website users across the Internet to analyze digital advertising auctions. This created large data sets, with millions or billions of rows. > They first implemented their product using relational databases, starting with Greenplum, a fork of PostgreSQL. It worked, but needed many more machines to scale, and that was too expensive. > They then used the NoSQL database HBase populated from Hadoop Mapreduce jobs. These jobs took hours to build the aggregations necessary for the product. At one point, adding only 3 dimensions on a data set that numbered in the low millions took the processing time from 9 hours to 24 hours.. > So, in the words of Eric Tschetter, one of Druid’s creators, “we did something crazy: we rolled our own database!” ([View Highlight](https://read.readwise.io/read/01gx2q2qkwgb7knyd7mq4gw3c6)) > Druid gets both performance and cost advantages by storing the segments on cloud storage and also pre-fetching them so they are ready when requested by the query engine. ([View Highlight](https://read.readwise.io/read/01gx2q3zrfnf3xp5m8zp9kqbft)) > ![](https://imply.io/wp-content/uploads/[email protected]) > <img decoding="async" class="aligncenter size-full wp-image-9792" src="https://imply.io/wp-content/uploads/[email protected]" alt="" width="1422" height="598" /> ([View Highlight](https://read.readwise.io/read/01gx2q8w5jzrqta5tgvck82m11))