Metadata
- Author: Christian Yarros
- Full Title:: Guide: Run Faster and Cost-Effective Dataproc Jobs
- Category:: 🗞️Articles
- Document Tags:: spark
- URL:: https://cloud.google.com/blog/products/data-analytics/dataproc-job-optimization-how-to-guide
- Finished date:: 2023-03-26
Highlights
autoscaling cluster can help determine the right number of workers for your application (View Highlight)
Prefer using smaller machine types (e.g. switch n2-highmem-32 to n2-highmem-8). It’s okay to have clusters with hundreds of small machines. For Dataproc clusters, choose the smallest machine with maximum network bandwidth (32 Gbps). Typically these machines are n2-standard-8 or n2d-standard-16 (View Highlight)
If you prioritize performance, utilize 100% primary workers. If you prioritize cost optimization, specify the remaining workers to be secondary workers (View Highlight)