New highlights added 2023-04-03
While Druid contains many features commonly found in search systems, such as the ability to stream in structured and semi-structured data and the ability to search and filter the data, Druid isn’t commonly used to ingest text logs and run full text search queries over the text logs (View Highlight)
Druid creates an indexed copy of raw data that is highly optimized for analytic queries. Druid runs queries over this indexed data, called a ‘segment’ in Druid, and does not pull raw data from an external storage system as needed by queries. (View Highlight)
Where does Druid fit in my big data stack?
A common streaming data oriented setup involving Druid looks like this: Raw data → Kafka → Stream processor (optional, typically for ETL) → Kafka (optional) → Druid → Application/user A common batch/static file oriented setup involving Druid looks like this: Raw data → Kafka (optional) → HDFS → ETL process (optional) → Druid → Application/user (View Highlight)