
## Metadata
- Author: [[jorge-sancha|Jorge Sancha]]
- Full Title:: Why LLMs Struggle With Analytics
- Category:: #🗞️Articles
- Document Tags:: [[Text to sql|Text To Sql]],
- URL:: https://www.tinybird.co/blog-posts/why-llms-struggle-with-analytics-and-how-we-fixed-that?utm_source=substack&utm_medium=email
- Read date:: [[2025-07-28]]
## Highlights
> Despite all the hype, LLMs rarely write perfect SQL on the first try. We've seen this in our own [LLM SQL benchmark](https://llm-benchmark.tinybird.live/), but don't take our word for it. Stanford research figured out early that even top performing language models [struggle significantly with complex SQL generation](https://arxiv.org/abs/2407.19517), especially for joins and database-specific syntax. ([View Highlight](https://read.readwise.io/read/01k180e4795qv1k8p0d5wcvqwc))
>  ([View Highlight](https://read.readwise.io/read/01k180eca4wf3pg0gyha5dc6yq))
Por otro lado, que peligro los “Data Patterns”, teniendo en cuenta que pueden cambiar.
> ### 4. Data Patterns - **Time range**: Rolling window of 12 months. Mostly active within past 90 days. - **Seasonality**: Weekly and monthly patterns—sales often spike on weekends or during campaigns. - **Currency normalization**: Revenue assumed to be in a single currency (e.g., USD). - **Country bias**: US and EU countries dominate volume. - **Skew**: 80/20 rule: few categories or customers account for most revenue. - **Completeness**: All fields are non-null; data is clean and consistently formatted. --- ### 5. Relationships and Join Potential - Can be joined on `product_id` to a product catalog for metadata (not included here). - Can be joined on `customer_id` to a customer profile table. - Often aggregated by `category`, `country`, or `timestamp`. - Not normalized—revenue is fully materialized for fast querying. --- ### 6. Example Questions the LLM Should Be Able to Answer - "Why did revenue drop last week in the US?" - "Which categories contributed most to last month's sales?" - "What was the average order value for France in June?" - "How does current revenue compare to the same week last year?" - "Which customers had the highest spend in Q2?" - "Was the revenue drop due to fewer orders or lower AOV?" - "Did any category underperform relative to last month?" --- ### 7. Query Tips for the LLM - Always filter by `timestamp` with explicit ranges (e.g., `>= now() - INTERVAL 7 DAY`) - Use `country` = 'US' or any region as a filter when geographic context is needed - Use `category` for group-by breakdowns - Pre-aggregate when possible using MATERIALIZED views for week/month summaries - Use ratio comparisons (`this_week/last_week`) for trend explanations ([View Highlight](https://read.readwise.io/read/01k180qezbqa04k3z40w25v35p))
> Also: Instruct your LLM to apply [SQL best practices](https://www.tinybird.co/blog-posts/5-rules-for-writing-faster-sql-queries) e.g., use sorting keys for filtering, filter before joining, select only the necessary columns, etc. These details matter a lot for performance. ([View Highlight](https://read.readwise.io/read/01k180shqmnp23edbwrh3pq4fh))