Database Indexes Are Not Magic: A Mental Model

Most engineers learn indexes as a ritual: a query is slow, someone says "add an index," the query gets fast, everyone moves on. That works until it doesn't, and when it stops working you have no model to reason about why. The index you added is ignored. The write that used to take a millisecond now takes ten. The query planner picks a sequential scan even though you "did everything right."

Indexes are not magic, and treating them as magic is exactly what leads to these surprises. Underneath, an index is a plain old data structure with predictable behavior. Once you can picture that structure, most index questions answer themselves. Indexing is the second lever in our Postgres performance guide; this piece is the mental model that makes the planner's choices stop feeling arbitrary.

The phone book in your database

The classic analogy is the phone book, and it survives because it is accurate. A phone book is sorted by last name, then first name. Finding "Okafor, Maya" is fast because you can binary-search: open the middle, decide left or right, repeat. Finding everyone whose phone number ends in 4242 is hopeless, because the book is not ordered by phone number. You would read every entry.

A B-tree index is the same idea generalized. It keeps a sorted, balanced tree of key values, each pointing back to the row's physical location. Looking up a value is a walk from the root down to a leaf, a handful of page reads even in a table with hundreds of millions of rows. That logarithmic cost is the entire reason indexes feel fast.

The critical consequence: an index is only useful when your query's access pattern matches the index's order. Equality and range queries on the leading columns are cheap. Queries that ignore that order are not.

CREATE INDEX idx_users_last_first
  ON users (last_name, first_name);
 
-- Uses the index: filtering on the leading column
SELECT * FROM users WHERE last_name = 'Okafor';
 
-- Uses the index: range scan on the leading column
SELECT * FROM users WHERE last_name BETWEEN 'A' AND 'M';
 
-- Cannot use the index efficiently: first_name is not the leading column
SELECT * FROM users WHERE first_name = 'Maya';

That last query is the phone-number problem. The index is sorted by last_name first, so knowing only first_name gives the planner no way to narrow the search.

Why composite column order is not arbitrary

Because the tree is sorted left to right, a composite index (a, b, c) behaves like a phone book sorted by a, then b, then c. You get efficient access for a, for a AND b, and for a AND b AND c. You do not get efficient access for b alone, or c alone. This is the "leftmost prefix" rule, and it is the single most useful thing to internalize about composite indexes.

Rule of thumb for composite indexes: put equality-filter columns first, then the column you range-scan or sort on. An index on (tenant_id, created_at) serves "this tenant's rows, newest first" beautifully; reversed, it serves it poorly.

Markus Winand's Use The Index, Luke is the canonical, vendor-neutral walkthrough of this reasoning, and it is worth reading end to end if you write SQL for a living.

The cost you do not see: writes

Here is the part the "just add an index" advice omits. Every index is a second copy of part of your data that has to be kept consistent. When you INSERT, UPDATE a indexed column, or DELETE, the database must update every affected index, keeping each tree balanced and sorted.

-- One logical write...
INSERT INTO orders (id, customer_id, status, total)
VALUES (gen_random_uuid(), 42, 'pending', 99.50);
 
-- ...but if orders has four indexes, that is up to five
-- structures to modify in one transaction: the table heap
-- plus each index that includes a column being written.

A table with ten indexes is a table where every write does roughly ten times the index bookkeeping. On a write-heavy table this is the difference between keeping up and falling behind. Indexes are a read-optimization that you pay for on every write, plus the disk and memory they occupy. The right number of indexes is the smallest set that covers your real query patterns, not "one per column just in case."

Why the planner sometimes ignores your index

Newcomers assume the database always uses an index if one exists. It does not, and that is correct behavior. The query planner is a cost estimator: it compares the predicted cost of an index scan against a sequential scan and picks the cheaper one.

If your WHERE clause matches 80% of the table, an index scan means jumping all over the disk to fetch most of the rows anyway, plus reading the index. A straight sequential scan of the table is cheaper. The planner knows this from its table statistics, which is why ANALYZE and accurate statistics matter so much.

-- See what the planner actually decided and why
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';

                            QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..18334.00 rows=920000 width=64)
   Filter: (status = 'pending')
   Rows Removed by Filter: 80000
   actual time=0.011..142.337 rows=920000 loops=1
 Planning Time: 0.094 ms
 Execution Time: 198.451 ms

If most rows are pending, that sequential scan is the planner being smart, not broken. Indexes pay off when they are selective: when they let you skip the overwhelming majority of rows. A column with two possible values is a poor index candidate; a high-cardinality column like email or user_id is an excellent one. PostgreSQL's own indexes documentation covers selectivity and the planner's cost model in depth.

A mental model you can carry

Hold these four pictures in your head and most index decisions become routine:

An index is a sorted, separate copy of some columns, pointing back to rows. Reads that follow the sort order are cheap; reads that fight it are not.
Composite indexes are sorted left to right. You can use a leftmost prefix of the columns, nothing else.
Every index taxes every write and consumes storage. Fewer, well-chosen indexes beat many speculative ones.
The planner chooses by estimated cost. A selective predicate makes an index worthwhile; a non-selective one makes a sequential scan the better choice.

None of this requires memorizing planner internals. It requires picturing the data structure and asking, "Does my query's access pattern match this index's order, and is it selective enough to be worth the random I/O?"

Takeaways

Indexes are B-trees: sorted, balanced structures that turn linear scans into logarithmic lookups, but only when the query matches their order.
Composite index column order is governed by the leftmost-prefix rule. Lead with equality columns, follow with range or sort columns.
Indexes are not free. Each one adds write cost and storage, so build the minimum set that covers your actual queries.
The planner ignoring your index is usually correct: on non-selective predicates a sequential scan really is cheaper.
Use EXPLAIN ANALYZE to see reality instead of guessing, and lean on Use The Index, Luke and the PostgreSQL docs when intuition runs out.

Database Indexes Are Not Magic: A Mental Model

The phone book in your database

Why composite column order is not arbitrary

The cost you do not see: writes

Why the planner sometimes ignores your index

A mental model you can carry

Takeaways

Read next

PostgreSQL Performance: A Practical Guide to Indexes, Query Plans, and Pooling

Index-Only Scans, Explained With Real Query Plans

Postgres JSONB: When to Use It and When It Bites