# PostgreSQL Experts Read EXPLAIN ANALYZE, Not Guesses—Voice AI for Demos Proves Why Guidance Must Read DOM, Not Generate Responses
*Hacker News #4 (146 points, 16 comments, 5hr): Database expert shares unconventional optimization techniques. The pattern: Read actual execution plans (`EXPLAIN ANALYZE`), read check constraints, read statistics—never guess query performance. Demo guidance works the same way: Read DOM structure, don't generate responses.*
---
## The Three Unconventional Optimizations (And Why They All Read Reality)
A PostgreSQL expert shared three unconventional optimization techniques on Hacker News. Each technique follows the same pattern: **Read what actually exists, don't guess or assume.**
### Technique #1: Eliminate Full Table Scans Based on Check Constraints
**The Problem:**
You have a users table with a check constraint:
```sql
CREATE TABLE users (
id INT PRIMARY KEY,
username TEXT NOT NULL,
plan TEXT NOT NULL,
CONSTRAINT plan_check CHECK (plan IN ('free', 'pro'))
);
```
An analyst writes this query:
```sql
SELECT * FROM users WHERE plan = 'Pro'; -- Capital 'P'
```
**The result:** 0 rows (the plan is 'pro' with lowercase 'p')
**The cost:** PostgreSQL scans the entire table! Even though the check constraint guarantees no row can have the value 'Pro', the database still checks every row.
**The Fix:**
```sql
SET constraint_exclusion to 'on';
```
Now PostgreSQL **reads the check constraint** before executing the query. It realizes the condition `plan = 'Pro'` will never return rows, so it skips the scan entirely.
**Execution time:** 7.4ms → 0.008ms
**The principle:** Read constraints (reality about data rules) instead of scanning tables (guessing what exists).
### Technique #2: Optimize for Lower Cardinality with Function-Based Indexes
**The Problem:**
You have 10 million sales records with timestamps:
```sql
CREATE TABLE sale (
id INT PRIMARY KEY,
sold_at TIMESTAMPTZ NOT NULL,
charged INT NOT NULL
);
```
Analysts produce daily reports:
```sql
SELECT date_trunc('day', sold_at AT TIME ZONE 'UTC'), SUM(charged)
FROM sale
WHERE '2025-01-01 UTC' <= sold_at AND sold_at < '2025-02-01 UTC'
GROUP BY 1;
```
**Full table scan:** 627ms
**After adding B-Tree index on `sold_at`:** 187ms
**Index size:** 214 MB (almost half the table size!)
**The insight:** Analysts want **daily** reports, but you indexed **millisecond** precision. You're giving them far more granularity than they need.
**The unconventional fix:**
Create a function-based index on just the **date**, not the full timestamp:
```sql
CREATE INDEX sale_sold_at_date_ix
ON sale((date_trunc('day', sold_at AT TIME ZONE 'UTC'))::date);
```
**Index size:** 66 MB (3x smaller)
**Query time:** 145ms (faster than the large index!)
**The problem:** Function-based indexes are fragile. If the analyst writes the expression slightly differently, the database won't use the index.
**The solution:** Virtual generated columns (PostgreSQL 18+):
```sql
ALTER TABLE sale
ADD sold_at_date DATE
GENERATED ALWAYS AS (date_trunc('day', sold_at AT TIME ZONE 'UTC'));
```
Now analysts use `sold_at_date` in queries, and the database is guaranteed to use the index. No discipline required.
**The principle:** Read the actual cardinality (distinct dates, not distinct timestamps) instead of abstracting away precision.
### Technique #3: Enforce Uniqueness with Hash Indexes
**The Problem:**
You store URLs to avoid reprocessing the same page twice:
```sql
CREATE TABLE urls (
id INT PRIMARY KEY,
url TEXT NOT NULL,
data JSON
);
CREATE UNIQUE INDEX urls_url_unique_ix ON urls(url);
```
**Table size:** 160 MB
**Index size:** 154 MB (almost the entire table!)
Why? B-Tree indexes store the actual values in leaf blocks. Web URLs can be massive (some apps store entire application state in URLs).
**The unconventional fix:**
Use a Hash index via exclusion constraint:
```sql
ALTER TABLE urls
ADD CONSTRAINT urls_url_unique_hash
EXCLUDE USING HASH (url WITH =);
```
**Hash index size:** 32 MB (5x smaller!)
**Why it works:** Hash indexes store hash values, not the actual URLs. Much smaller.
**Query performance:** 0.022ms (faster than B-Tree's 0.046ms)
**The principle:** Read hash values (fixed-size representation of data) instead of storing full values (variable-size reality).
## The Common Pattern: Read Actual Execution, Don't Guess Performance
All three techniques share the same approach:
**Conventional optimization:**
1. Query is slow
2. Guess what might help
3. Add index
4. Hope it works
**Unconventional optimization:**
1. Query is slow
2. **Read `EXPLAIN ANALYZE` output** (actual execution plan)
3. **Read table statistics** (actual cardinality, actual distribution)
4. **Read constraints** (actual data rules)
5. Optimize based on reality, not assumptions
## Why Database Experts Use EXPLAIN ANALYZE
The article reveals the fundamental tool of database optimization:
```sql
EXPLAIN ANALYZE SELECT * FROM users WHERE plan = 'Pro';
```
**Output:**
```
Seq Scan on users (cost=0.00..2185.00 rows=1)
(actual time=7.406..7.407 rows=0.00 loops=1)
Filter: (plan = 'Pro'::text)
Rows Removed by Filter: 100000
Execution Time: 7.436 ms
```
**What EXPLAIN ANALYZE reveals:**
- **Actual execution:** Sequential scan (not index scan)
- **Actual rows processed:** 100,000 (not estimated)
- **Actual time:** 7.4ms (not guessed)
- **Actual filtering:** Every row checked (not skipped)
**The insight:** You can't optimize what you don't measure. `EXPLAIN ANALYZE` reads the actual query execution—it doesn't generate a prediction, it observes reality.
## The Three Levels of "Reading Reality" in PostgreSQL
### Level #1: Read Execution Plans (EXPLAIN ANALYZE)
**What it reads:**
- Actual rows scanned
- Actual indexes used
- Actual execution time
- Actual memory usage
**Why it matters:** You see what the database **actually did**, not what the query planner **predicted** it would do.
**The parallel to demos:** Voice AI reads what the page **actually contains** (DOM structure), not what it **predicts** users want to know.
### Level #2: Read Table Statistics
**What it reads:**
- Row count (actual table size)
- Column cardinality (distinct values)
- Data distribution (most common values, histogram)
- Null fraction (percentage of nulls)
**Why it matters:** The query planner uses these statistics to **estimate** execution cost. Reading statistics = reading reality about your data.
**The parallel to demos:** Voice AI reads heading hierarchy, button labels, form fields—actual page semantics, not predicted user intent.
### Level #3: Read Constraints
**What it reads:**
- Check constraints (valid value ranges)
- Unique constraints (no duplicates allowed)
- Foreign keys (referential integrity)
- NOT NULL constraints (required fields)
**Why it matters:** Constraints define data rules. Reading constraints allows the database to eliminate impossible conditions **without scanning data**.
**The parallel to demos:** Voice AI reads ARIA labels, semantic HTML, `alt` text—accessibility metadata that defines page meaning.
## Why Guessing Query Performance Doesn't Work
The article shows what happens when you optimize without reading reality:
**Scenario #1: Guessing index needs**
- Analyst queries are slow
- You guess: "Add an index on `sold_at`"
- Result: 214 MB index (50% of table size), only marginally faster queries
**Reading reality instead:**
- Run `EXPLAIN ANALYZE` to see actual query pattern
- Read query logs to see actual grouping (by day, not millisecond)
- Create smaller, targeted index on just the date
- Result: 66 MB index (3x smaller), faster queries
**Scenario #2: Guessing constraint impact**
- Query returns 0 rows
- You guess: "Database must scan entire table to confirm no matches"
- Result: 7.4ms wasted on every impossible query
**Reading reality instead:**
- Enable `constraint_exclusion`
- Database reads check constraints before executing query
- Realizes condition is impossible, skips scan
- Result: 0.008ms (1000x faster)
## The Parallel: Chatbot Demos Guess, Voice AI Reads
The PostgreSQL optimization pattern mirrors the demo guidance pattern exactly:
### Chatbot Demo Pattern = Guessing Query Performance
**Chatbot approach:**
1. User asks: "How does pricing work?"
2. Chatbot generates answer from training data (guessing based on past patterns)
3. User must verify against page (like running `EXPLAIN ANALYZE` after optimizing)
4. If answer is wrong, user wasted time (like a bad index: costs resources, doesn't help)
**Why it fails:** Generating responses without reading the page is like optimizing queries without `EXPLAIN ANALYZE`—you're guessing, not measuring.
### Voice AI Pattern = Reading Execution Plans
**Voice AI approach:**
1. User asks: "How does pricing work?"
2. Voice AI reads page structure:
- Heading: "Pricing Plans"
- Three cards: "Free", "Pro", "Enterprise"
- Bullet points under each
3. Voice AI describes what exists: "There are three plans. The Free plan includes..."
4. User understands immediately (no verification needed)
**Why it works:** Reading the DOM is like running `EXPLAIN ANALYZE`—you're observing reality, not predicting it.
## The Three Optimization Principles That Apply to Demo Guidance
### Principle #1: Read Constraints to Eliminate Impossible Conditions
**In PostgreSQL:**
Enable `constraint_exclusion` so the database reads check constraints:
```sql
SELECT * FROM users WHERE plan = 'Pro'; -- Lowercase 'pro' required
```
Database reads constraint: `CHECK (plan IN ('free', 'pro'))`. Condition impossible. Skip scan.
**In Demo Guidance:**
Voice AI reads semantic constraints on the page:
```html
```
User asks: "Is email optional?"
Voice AI reads `required` attribute. Answer: "Email is required." No guessing needed.
### Principle #2: Read Actual Cardinality, Not Assumed Precision
**In PostgreSQL:**
Don't index millisecond precision when users query by day:
```sql
-- Too precise (214 MB index)
CREATE INDEX sale_sold_at_ix ON sale(sold_at);
-- Right precision (66 MB index)
CREATE INDEX sale_sold_at_date_ix
ON sale((date_trunc('day', sold_at))::date);
```
Read actual query patterns. Optimize for the granularity users actually need.
**In Demo Guidance:**
Don't generate exhaustive feature lists when users need high-level categories:
**Chatbot approach:**
User: "What features do you have?"
Chatbot: *Generates 3000-word comprehensive feature list from training data*
**Voice AI approach:**
User: "What features do you have?"
Voice AI reads page: "The navigation shows five main categories: Dashboard, Analytics, Integrations, Settings, Support."
Read actual page organization. Provide the level of detail that exists on the page.
### Principle #3: Read Hash Values for Large, Unique Data
**In PostgreSQL:**
Don't store entire URLs in B-Tree indexes:
```sql
-- Stores full URLs (154 MB index)
CREATE UNIQUE INDEX urls_url_unique_ix ON urls(url);
-- Stores hash values (32 MB index)
ALTER TABLE urls
ADD CONSTRAINT urls_url_unique_hash
EXCLUDE USING HASH (url WITH =);
```
Hash values are fixed-size, URLs are variable-size. Hash index is 5x smaller.
**In Demo Guidance:**
Don't regenerate entire page content in responses:
**Chatbot approach:**
User: "How do I export data?"
Chatbot: *Generates step-by-step guide based on training data, potentially outdated or inaccurate*
**Voice AI approach:**
User: "How do I export data?"
Voice AI reads button label: "There's an 'Export to CSV' button in the top-right corner."
Reference actual page elements (pointers to DOM) instead of regenerating content (full descriptions).
## Why Reading EXPLAIN ANALYZE Is Like Reading the DOM
The article reveals what `EXPLAIN ANALYZE` actually does:
```sql
EXPLAIN ANALYZE SELECT * FROM users WHERE plan = 'Pro';
```
**Output structure:**
- **Operation type:** Sequential Scan / Index Scan / Hash Join
- **Actual rows:** How many rows actually processed
- **Actual time:** How long execution actually took
- **Filter conditions:** What was actually checked
**This is metadata about query execution—exactly like DOM metadata about page structure.**
**DOM equivalent:**
```html
Free Plan
- 10 users included
- 2 GB storage
```
**DOM metadata:**
- **Element type:** `
` (not generated, exists in HTML)
- **Semantic label:** "Pricing Plans" (aria-label, actual content)
- **Structure:** Heading + list (actual hierarchy)
- **Content:** "10 users included" (actual text, not hallucinated)
**The parallel:** `EXPLAIN ANALYZE` reads query execution metadata. Voice AI reads DOM metadata. Both observe reality instead of predicting it.
## The Three Mistakes Database Beginners Make (And Demo Chatbots Repeat)
### Mistake #1: Optimizing Without Measuring
**Database beginner:**
- Query is slow
- Add index blindly
- Don't run `EXPLAIN ANALYZE` to verify improvement
- Result: Wasted storage, no performance gain
**Chatbot demo:**
- User confused
- Generate more detailed response
- Don't read page to verify accuracy
- Result: More hallucinations, no clarity gain
### Mistake #2: Indexing Everything "Just in Case"
**Database beginner:**
- Worried about slow queries
- Create indexes on every column
- Don't read actual query patterns
- Result: Massive storage cost, slower writes, minimal read improvement
**Chatbot demo:**
- Worried about user questions
- Train on every possible topic
- Don't read what's actually on the page
- Result: Generates answers about features that don't exist
### Mistake #3: Trusting Estimates Over Measurements
**Database beginner:**
- Query planner estimates 10 rows
- Actually returns 100,000 rows
- Trusts estimate, doesn't measure actual execution
- Result: Wrong optimization decisions
**Chatbot demo:**
- Training data suggests feature exists
- Actually removed 6 months ago
- Trusts training data, doesn't read current page
- Result: Tells user about feature that doesn't exist
## Why Conventional Optimization Fails (And Why Chatbot Demos Fail)
The article reveals why "slapping a B-Tree on it" is the conventional approach:
**Why developers do this:**
1. Query is slow
2. Everyone says "add an index"
3. Add B-Tree index on the filtered column
4. Query gets faster
5. Done
**Why it's suboptimal:**
- Index size not considered (214 MB for timestamps when 66 MB date index would work)
- Index maintenance cost ignored (slower inserts/updates)
- Storage cost ignored (paying for precision you don't need)
- Alternative approaches not explored (hash indexes, partial indexes, expression indexes)
**The chatbot parallel:**
**Why teams build chatbot demos:**
1. Users need guidance
2. Everyone says "add AI chatbot"
3. Add LLM that generates responses
4. Users can ask questions
5. Done
**Why it's suboptimal:**
- Hallucination risk not considered (generates features that don't exist)
- Maintenance cost ignored (retraining when features change)
- Accuracy cost ignored (users must verify every response)
- Alternative approaches not explored (read DOM directly instead of generating responses)
## The Virtual Generated Column Principle
The article introduces PostgreSQL 18's virtual generated columns as the solution to the "discipline problem":
**The problem:** Function-based indexes require exact expression match:
```sql
-- Index defined as:
CREATE INDEX sale_sold_at_date_ix
ON sale((date_trunc('day', sold_at AT TIME ZONE 'UTC'))::date);
-- Query MUST use exact same expression to use index:
WHERE date_trunc('day', sold_at AT TIME ZONE 'UTC')::date = '2025-01-15' -- Works
-- Slightly different expression won't use index:
WHERE (sold_at AT TIME ZONE 'UTC')::date = '2025-01-15' -- Full table scan!
```
**The solution:** Virtual generated column:
```sql
ALTER TABLE sale
ADD sold_at_date DATE
GENERATED ALWAYS AS (date_trunc('day', sold_at AT TIME ZONE 'UTC'));
CREATE INDEX sale_sold_at_date_ix ON sale(sold_at_date);
```
Now queries use the column name:
```sql
WHERE sold_at_date = '2025-01-15' -- Always uses index!
```
**The principle:** Create a canonical representation that guarantees the right expression is used.
**The parallel to demos:**
**The problem:** Different pages describe the same feature differently:
- Homepage: "Real-time analytics dashboard"
- Features page: "Analytics & Reporting"
- Docs: "Dashboard - Analytics Module"
Chatbot must guess which term to use, which description is current.
**The solution:** Voice AI reads what's actually on the current page:
- On homepage: Reads "Real-time analytics dashboard"
- On features page: Reads "Analytics & Reporting"
- On docs: Reads "Dashboard - Analytics Module"
No canonical representation needed—just read what exists on each page.
## The Constraint Exclusion Insight
The article reveals the most powerful optimization: `constraint_exclusion`.
**What it does:** Before executing a query, PostgreSQL reads table constraints and eliminates conditions that can never be true.
**Example:**
```sql
-- Constraint: plan must be 'free' or 'pro'
CONSTRAINT plan_check CHECK (plan IN ('free', 'pro'))
-- Query for 'Pro' (capital P):
SELECT * FROM users WHERE plan = 'Pro';
```
**Without `constraint_exclusion`:**
- Scan all 100,000 rows
- Check each row: does `plan = 'Pro'`?
- Find 0 matches
- Time: 7.4ms
**With `constraint_exclusion = on`:**
- Read constraint: `plan IN ('free', 'pro')`
- Check query: `plan = 'Pro'`
- Realize: 'Pro' not in allowed values
- Skip scan entirely
- Time: 0.008ms (1000x faster!)
**The insight:** Reading constraints (metadata about valid data) is faster than scanning data itself.
**The parallel to demos:**
**Without reading DOM:**
- User asks: "Can I add unlimited users?"
- Chatbot generates answer from training data: "The Enterprise plan includes unlimited users."
- User must verify against page
- If wrong, wasted time
**With reading DOM:**
- User asks: "Can I add unlimited users?"
- Voice AI reads pricing table:
```html
Users: Up to 100 |
```
- Voice AI: "The plan includes up to 100 users."
- Accurate immediately (reading actual page metadata)
## The Three Reasons Unconventional Optimizations Work
### Reason #1: They Read Multiple Levels of Reality
**Conventional optimization:** Read query text only
**Unconventional optimization:**
- Read query text
- Read execution plan (`EXPLAIN ANALYZE`)
- Read table statistics
- Read constraints
- Read data distribution
**More data sources = more accurate optimizations**
**The parallel to demos:**
**Chatbot:** Reads training data only
**Voice AI:**
- Reads page HTML
- Reads ARIA labels
- Reads heading hierarchy
- Reads form structure
- Reads button labels
**More data sources = more accurate guidance**
### Reason #2: They Measure Actual Execution, Not Predicted Cost
**Query planner estimates:**
- Estimated rows: 10
- Estimated cost: 100
- Estimated time: 50ms
**Actual execution (`EXPLAIN ANALYZE`):**
- Actual rows: 100,000
- Actual cost: 10,000
- Actual time: 5000ms
**The problem:** Estimates can be wildly wrong. Only measurement reveals reality.
**The parallel to demos:**
**Chatbot prediction:**
- Predicts user wants feature overview
- Generates comprehensive description
- Predicted helpfulness: high
**Actual user need:**
- User looking at specific button
- Wants to know what clicking it does
- Actual helpfulness: low (irrelevant response)
**Voice AI measurement:**
- Reads which page section user is viewing
- Describes elements in that section
- Measured helpfulness: high (contextually relevant)
### Reason #3: They Optimize for Actual Usage Patterns, Not Theoretical Needs
**The timestamp index example:**
**Theoretical need:** "Users might want to query at any precision"
**Actual usage:** Analysts only produce daily reports
**Theoretical optimization:** Index full timestamp (214 MB)
**Actual optimization:** Index just the date (66 MB, faster)
**The parallel to demos:**
**Theoretical need:** "Users might ask any question about the product"
**Actual usage:** Users ask about elements visible on current page
**Theoretical optimization:** Chatbot trained on entire documentation
**Actual optimization:** Voice AI reads current page structure
## The Verdict: Database Experts Read Reality, They Don't Guess
The HN article proves database optimization requires reading multiple levels of reality:
**Level 1:** Read query execution (`EXPLAIN ANALYZE`)
**Level 2:** Read table statistics (cardinality, distribution)
**Level 3:** Read constraints (data rules)
**Level 4:** Read actual usage patterns (query logs)
**The lesson:** You can't optimize what you don't measure. You can't measure without reading reality.
**The parallel to demo guidance:**
**Level 1:** Read page structure (DOM tree)
**Level 2:** Read semantic metadata (ARIA labels, heading hierarchy)
**Level 3:** Read element properties (`required`, `disabled`, `href`)
**Level 4:** Read user context (which section they're viewing)
**The lesson:** You can't guide what you don't read. You can't read without accessing the DOM.
## The Alternative: Guessing Performance Without Measurement
Imagine if database optimization worked like chatbot demos:
**Hypothetical "AI Database Optimizer":**
1. Developer: "This query is slow"
2. AI: *Predicts* query might need an index
3. AI: *Generates* index suggestion based on training data
4. Developer must run `EXPLAIN ANALYZE` to verify if suggestion helps
5. If wrong, wasted storage and time
**Why this would fail:** You need to measure actual execution to optimize. Predictions without measurement are guesses.
**Current chatbot demos work this way:**
1. User: "How does this work?"
2. Chatbot: *Predicts* user wants feature overview
3. Chatbot: *Generates* description based on training data
4. User must check page to verify if description is accurate
5. If wrong, wasted time and confusion
**Why this fails:** You need to read actual page content to guide. Predictions without reading are hallucinations.
**Voice AI works like EXPLAIN ANALYZE:**
1. User: "How does this work?"
2. Voice AI: *Reads* page structure
3. Voice AI: *Describes* what actually exists
4. User understands immediately (no verification needed)
5. Always accurate (reading reality)
## The Pattern: Read First, Optimize Second
The article proves the fundamental pattern of database optimization:
**You can't optimize what you haven't measured.**
Every technique follows this sequence:
1. Identify slow query
2. **Run `EXPLAIN ANALYZE`** (read actual execution)
3. **Read table statistics** (understand data distribution)
4. **Read constraints** (understand data rules)
5. **Based on readings**, choose optimization approach
6. Implement optimization
7. **Run `EXPLAIN ANALYZE` again** (verify improvement)
**The parallel to demo guidance:**
**You can't guide what you haven't read.**
Every interaction follows this sequence:
1. User asks question
2. **Read DOM structure** (understand page content)
3. **Read semantic metadata** (understand element purpose)
4. **Read user context** (understand which section they're viewing)
5. **Based on readings**, provide relevant guidance
6. Describe what exists
7. **User understands immediately** (no verification needed)
---
*Demogod's voice AI reads your site's DOM structure like `EXPLAIN ANALYZE` reads query execution—observing reality instead of predicting it. One line of code. Zero hallucinations. [Try it on your site](https://demogod.me).*