Metadata
- Author: Jorge Sancha
- Full Title:: Why LLMs Struggle With Analytics
- Category:: 🗞️Articles
- Document Tags:: text to sql,
- URL:: https://www.tinybird.co/blog-posts/why-llms-struggle-with-analytics-and-how-we-fixed-that?utm_source=substack&utm_medium=email
- Read date:: 2025-07-28
Highlights
Despite all the hype, LLMs rarely write perfect SQL on the first try. We’ve seen this in our own LLM SQL benchmark, but don’t take our word for it. Stanford research figured out early that even top performing language models struggle significantly with complex SQL generation, especially for joins and database-specific syntax. (View Highlight)
Por otro lado, que peligro los “Data Patterns”, teniendo en cuenta que pueden cambiar.
4. Data Patterns - Time range: Rolling window of 12 months. Mostly active within past 90 days. - Seasonality: Weekly and monthly patterns—sales often spike on weekends or during campaigns. - Currency normalization: Revenue assumed to be in a single currency (e.g., USD). - Country bias: US and EU countries dominate volume. - Skew: 80/20 rule: few categories or customers account for most revenue. - Completeness: All fields are non-null; data is clean and consistently formatted. --- 5. Relationships and Join Potential - Can be joined on
product_id
to a product catalog for metadata (not included here). - Can be joined oncustomer_id
to a customer profile table. - Often aggregated bycategory
,country
, ortimestamp
. - Not normalized—revenue is fully materialized for fast querying. --- 6. Example Questions the LLM Should Be Able to Answer - “Why did revenue drop last week in the US?” - “Which categories contributed most to last month’s sales?” - “What was the average order value for France in June?” - “How does current revenue compare to the same week last year?” - “Which customers had the highest spend in Q2?” - “Was the revenue drop due to fewer orders or lower AOV?” - “Did any category underperform relative to last month?” --- 7. Query Tips for the LLM - Always filter bytimestamp
with explicit ranges (e.g.,>= now() - INTERVAL 7 DAY
) - Usecountry
= ‘US’ or any region as a filter when geographic context is needed - Usecategory
for group-by breakdowns - Pre-aggregate when possible using MATERIALIZED views for week/month summaries - Use ratio comparisons (this_week/last_week
) for trend explanations (View Highlight)
Also: Instruct your LLM to apply SQL best practices e.g., use sorting keys for filtering, filter before joining, select only the necessary columns, etc. These details matter a lot for performance. (View Highlight)