Learnings from streaming 25 billion events to Google bigquery

rw-book-cover

Metadata

Author: Aride Chettali
Full Title:: Learnings From Streaming 25 Billion Events to Google BigQuery
Category:: 🗞️Articles
URL:: https://aride.medium.com/learnings-from-streaming-25-billion-events-to-google-bigquery-57ce81fa9898
Finished date:: 2023-04-07

Highlights

But spark did not have any connector to BigQuery that writes the data into BigQuery using streaming APIs. All connectors that I evaluated were writing into the GCS bucket and then performing a batch load to BigQuery. Hence I decided to write a BigQuery streaming sink for spark and use it for my PoC. (View Highlight)

The default quota for streaming maximum bytes per second is 1GB per GCP project. Any ingestion above this limit would result in BigQueryException with quotaExceeded Error (View Highlight)

When I enabled “dedupe” only one record is getting duplicated for every 5 million records ingested. Enabling deduplication does not guarantee 100% duplication removal rather it is only the best effort to remove duplicates (View Highlight)

streaming table you end up reading data from write optimized streaming buffer and that’s the exact reason for higher latency (View Highlight)

Dr. Mario's 2nd 🧠

Explorer

Learnings from streaming 25 billion events to Google bigquery

Metadata

Highlights

Webmentions

❤️ Likes

🔄 Reposts

💬 Replies

🔗 Mentions

Graph View

Table of Contents