
## Metadata
- Author: [[christian-yarros|Christian Yarros]]
- Full Title:: Guide: Run Faster and Cost-Effective Dataproc Jobs
- Category:: #🗞️Articles
- Document Tags:: [[spark|Spark]]
- URL:: https://cloud.google.com/blog/products/data-analytics/dataproc-job-optimization-how-to-guide
- Finished date:: [[2023-03-26]]
## Highlights
> autoscaling cluster can help determine the right number of workers for your application ([View Highlight](https://read.readwise.io/read/01gwdhtrx38znc5sa8q58zbt8b))
> Prefer using smaller machine types (e.g. switch n2-highmem-32 to n2-highmem-8). It’s okay to have clusters with hundreds of small machines. For Dataproc clusters, choose the smallest machine with maximum network bandwidth (32 Gbps). Typically these machines are n2-standard-8 or n2d-standard-16 ([View Highlight](https://read.readwise.io/read/01gwdhwbp3djxgm3xzqdhpmz2k))
> If you prioritize performance, utilize 100% primary workers. If you prioritize cost optimization, specify the remaining workers to be secondary workers ([View Highlight](https://read.readwise.io/read/01gwdhyqr831fgmxw7fh90n85x))