- Tags:: #📚Books , [[SQL]] - Author:: [[Ergest Xheblati]] - Liked:: #3/5 - Link:: [Minimum Viable SQL Patterns (gumroad.com)](https://ergestx.gumroad.com/l/sqlpatterns) - Source date:: [[2022-04-12]] - Read date:: [[2023-10-08]] - Cover:: ![[cover_minimum_viable_sql_patterns.png|100]] ## Why did I want to read it? Looking for tips&tricks for new-to-SQL [[Business Analysis|Business Analysts]]. ## What did I get out of it? Bi tools such as Looker help us to avoid aggregation sometimes > This is a very common pattern of storing data in a data warehouse. You keep the table at the finest possible grain (i.e. one transaction per row) and then aggregate ([View Highlight](https://read.readwise.io/read/01hc7q7n1q98tfdv9yq6ytmjtz)) > it up to whatever level is needed for reporti ([View Highlight](https://read.readwise.io/read/01hc7q859pjhk0mzre316k718n)) There are other ways to do this (e. g. , with a PIVOT SQL clause) > Pivoting Data ([View Highlight](https://read.readwise.io/read/01hc7qcnefbxe5vsqvn0r8j1hm)) Where True makes it easy to read/add/remove where conditions > WHERE > TRUE AND ([View Highlight](https://read.readwise.io/read/01hc7qhhc2zbpz8ysw2rn08v1z)) I think this example is a bit convoluted: if you do a left join but you put a filter on the joined table, you treat NULL as something that doesn't match you boolean clauses > Suppose we only want to see users with a reputation of higher than 50. That’s seems pretty straightforward just add the condition to the where clause > SELECT > ph.post_id, ph.user_id, u.display_name AS user_name, ph.creation_date AS activity_date > FROM > bigquery-public-data.stackoverflow.post_history ph LEFT JOIN bigquery-public-data.stackoverflow.users u ON u.id = ph.user_id > WHERE > TRUE AND ph.post_id = 4 AND u.reputation > 50 > ORDER BY activity_date; > We only get 56 rows! What happened? Adding filters on the where clause for tables that are left joined will ALWAYS perform an INNER JOIN except for one single condition where the left join is preserved. If we wanted to filter rows in the users table and still do a LEFT JOIN we have to add the filter in the join condition like so: ([View Highlight](https://read.readwise.io/read/01hc7qxehz15ypjh4b99rgk6x1)) > Starting with a LEFT JOIN Since we’re on the subject of LEFT JOINS, one of my most used rules of thumb is to always use aLEFT JOINwhen I’mnot sure if one table is a subset of the other ([View Highlight](https://read.readwise.io/read/01hc7qvb70krp4c4f5ax3k59rp)) I would go as far as to add another CTE for filtering by" created" so that it is even easier to spot. > If we simply joined with post_activity on post_id without specifying the activity_type we’d get at least two times the number of rows ([View Highlight](https://read.readwise.io/read/01hc7rbqn44rgcpyybp1a0xp97)) We do this with dbt. > Creating Views ([View Highlight](https://read.readwise.io/read/01hc7rgy41b9yxneeges0ek2k4)) Careful: this Is NOT the pricing model of Big Query > Faster queries get you results faster, while consuming fewer resources, making them cheaper on modern data warehouses ([View Highlight](https://read.readwise.io/read/01hc7rjjtwwntj8kvthbr01xwj)) In BQ, this does not happen automatically: we would need for that table to be partitioned by the filtering condition > What we’re doing here is filtering each table to only 90 days so we can both to keep costs down and make the query faster ([View Highlight](https://read.readwise.io/read/01hc7rndz2hakdbrpgjngr28d1)) > When you use compare a column to a fixed value or to another column, the query optimizer can filter down to the relevant rows much faster. When you use a function or a complicated formula, the optimizer needs to scan the entire table to do the filtering ([View Highlight](https://read.readwise.io/read/01hc7rrsfsa228jgm84qc8846s))