Guide run faster and cost effective dataproc jobs

![rw-book-cover](https://storage.googleapis.com/gweb-cloudblog-publish/images/DO_NOT_USE_CUxs9oC.max-2500x2500.jpg) ## Metadata - Author: [[christian-yarros|Christian Yarros]] - Full Title:: Guide: Run Faster and Cost-Effective Dataproc Jobs - Category:: #🗞️Articles - Document Tags:: [[spark|Spark]] - URL:: https://cloud.google.com/blog/products/data-analytics/dataproc-job-optimization-how-to-guide - Finished date:: [[2023-03-26]] ## Highlights > autoscaling cluster can help determine the right number of workers for your application ([View Highlight](https://read.readwise.io/read/01gwdhtrx38znc5sa8q58zbt8b)) > Prefer using smaller machine types (e.g. switch n2-highmem-32 to n2-highmem-8). It’s okay to have clusters with hundreds of small machines. For Dataproc clusters, choose the smallest machine with maximum network bandwidth (32 Gbps). Typically these machines are n2-standard-8 or n2d-standard-16 ([View Highlight](https://read.readwise.io/read/01gwdhwbp3djxgm3xzqdhpmz2k)) > If you prioritize performance, utilize 100% primary workers. If you prioritize cost optimization, specify the remaining workers to be secondary workers ([View Highlight](https://read.readwise.io/read/01gwdhyqr831fgmxw7fh90n85x))