rw-book-cover

Metadata

Highlights

today’s talk is we’re notactually going to go through spark codewe’re teaching you how to optimize sparkfor how you lay your data out andactually save it and that is honestly80% of the game of efficiency for sparkis choosing file formats that are goodchoosing the right file sizeschoosing the right partitions and otherthings so it has nothing to do with codejust how you layout your data withinspark (View Highlight)

in spock 2.0 we persistit every single partition into the metalstoreand the pushers petition pruning into the minister in this way you do not needto perform partition discovery everytime you read the table from the filesystem (View Highlight)