Metadata
- Author: Aride Chettali
- Full Title:: Learnings From Streaming 25 Billion Events to Google BigQuery
- Category:: 🗞️Articles
- URL:: https://aride.medium.com/learnings-from-streaming-25-billion-events-to-google-bigquery-57ce81fa9898
- Finished date:: 2023-04-07
Highlights
But spark did not have any connector to BigQuery that writes the data into BigQuery using streaming APIs. All connectors that I evaluated were writing into the GCS bucket and then performing a batch load to BigQuery. Hence I decided to write a BigQuery streaming sink for spark and use it for my PoC. (View Highlight)
The default quota for streaming maximum bytes per second is 1GB per GCP project. Any ingestion above this limit would result in BigQueryException with quotaExceeded Error (View Highlight)
When I enabled “dedupe” only one record is getting duplicated for every 5 million records ingested. Enabling deduplication does not guarantee 100% duplication removal rather it is only the best effort to remove duplicates (View Highlight)
streaming table you end up reading data from write optimized streaming buffer and that’s the exact reason for higher latency (View Highlight)