DevgainsDevgainsDevgains
All articles

Postgres JSONB: When to Use It and When It Bites

·6 min read
Postgres JSONB: When to Use It and When It Bites

Photo: Unsplash

PostgreSQL's JSONB type is one of those features that feels like a superpower the first week and a liability the first time you have to migrate it. It lets you drop semi-structured data into a relational database without designing a schema, query into it with real operators, and even index it. Used well, it is genuinely excellent. Used as a substitute for thinking about your data model, it quietly becomes the thing that bites you eighteen months later.

The skill is not "JSONB good" or "JSONB bad." It is knowing which of your columns deserve the flexibility and which are columns pretending to be JSON. Let us draw that line clearly.

JSON vs JSONB: pick JSONB

First, a quick disambiguation, because Postgres has two JSON types. json stores an exact text copy of the input, preserving whitespace and key order, and re-parses it on every access. jsonb stores a decomposed binary form: parsing happens once on write, and reads are fast. jsonb also supports indexing and a rich operator set.

For virtually all application use, you want jsonb. The only real reason to reach for json is if you must preserve the exact textual representation, which is rare. The PostgreSQL JSON types documentation lays out the distinction in full.

CREATE TABLE products (
  id          bigserial PRIMARY KEY,
  sku         text NOT NULL,
  name        text NOT NULL,
  price_cents int  NOT NULL,
  attributes  jsonb NOT NULL DEFAULT '{}'
);
 
INSERT INTO products (sku, name, price_cents, attributes) VALUES
  ('TS-001', 'Wool Coat', 24900,
   '{"color": "charcoal", "sizes": ["S","M","L"], "waterproof": true}');

Where JSONB genuinely shines

JSONB earns its place when the data is legitimately variable in shape, or when imposing columns would be premature or impossible:

  • Heterogeneous attributes. A products table where each category has different fields: coats have waterproof, mugs have volume_ml. Modeling every possible attribute as a column gives you a sparse mess; one attributes jsonb column handles them all.
  • External payloads you store as-is. Webhook bodies, third-party API responses, audit snapshots. The shape is owned by someone else and may change without warning.
  • User-defined or plugin data. Custom form fields, feature flags, per-tenant settings where you cannot predict the keys.

In all three, the defining trait is the same: the structure is genuinely unknown, variable, or not yours to control. That is the right home for JSONB.

-- Real querying into the document, not just blob storage
SELECT sku, name
FROM products
WHERE attributes ->> 'color' = 'charcoal'        -- ->> extracts text
  AND attributes @> '{"waterproof": true}';       -- @> contains

The ->>, ->, @>, and ? operators make JSONB a queryable structure, not an opaque blob. That is exactly what separates it from stuffing a JSON string into a text column.

Indexing JSONB: the GIN index

The objection "but querying inside JSON must be slow" is answered by the GIN index. A GIN (Generalized Inverted Index) on a jsonb column lets containment and key-existence queries use an index instead of scanning every row and parsing every document.

-- Index the whole document for containment (@>) and key (?) queries
CREATE INDEX idx_products_attributes
  ON products USING gin (attributes);
 
EXPLAIN ANALYZE
SELECT sku FROM products
WHERE attributes @> '{"waterproof": true}';
                            QUERY PLAN
------------------------------------------------------------------
 Bitmap Heap Scan on products  (cost=12.00..156.21 rows=48 width=12)
   Recheck Cond: (attributes @> '{"waterproof": true}'::jsonb)
   ->  Bitmap Index Scan on idx_products_attributes
         (cost=0.00..11.99 rows=48 width=0)
         Index Cond: (attributes @> '{"waterproof": true}'::jsonb)
 Planning Time: 0.180 ms
 Execution Time: 0.402 ms

That Bitmap Index Scan is the GIN index doing its job. There is also a more compact jsonb_path_ops GIN variant that is smaller and faster for containment queries at the cost of supporting fewer operators. The JSON indexing documentation covers both.

GIN indexes make JSONB reads fast, but they are larger and slower to update than B-tree indexes. On a write-heavy JSONB column, the index maintenance cost is real. Index the column because you query it often, not reflexively.

Where JSONB bites

Now the failure modes, because this is where teams get hurt.

1. Using JSONB for data that is obviously relational. If every row has the same keys and you constantly filter, join, and aggregate on them, those are columns wearing a JSON costume. You lose NOT NULL, foreign keys, type checking, and clean B-tree indexes, and you gain nothing. A price you sort by belongs in a typed column, not in attributes ->> 'price'.

2. No schema means no guarantees. The database will happily store {"color": "charcoal"} in one row and {"colour": 7} in the next. Typos, type drift, and missing keys are caught nowhere. With real columns, the database enforces shape; with JSONB, your application code becomes the only schema, and it is never as thorough as you think.

3. Awkward, fragile querying for relational access. Pulling a value out of JSON requires casting, and casts are easy to get wrong:

-- Sorting by a numeric value stored in JSON: note the explicit cast.
-- Forget it, and you sort "100" before "20" as text. Subtle and nasty.
SELECT sku FROM products
ORDER BY (attributes ->> 'price_cents')::int DESC;

4. Painful migrations. Renaming a key or changing a value's type across millions of JSONB documents means a bulk UPDATE rewriting every row, with no column-level tooling to help. A real column rename is metadata; a JSON key rename is a data migration.

5. Update write amplification. Updating one key in a large JSONB document rewrites the entire document, because Postgres stores the value as a whole. Frequent small updates to big documents generate far more write and bloat than updating a narrow column would.

A practical decision rule

Ask one question of each piece of data: is this shape known and stable, and do I query or constrain it relationally?

  • Yes to both: make it a real, typed column. You get integrity, clean indexing, and cheap migrations.
  • No (the shape is variable, external, or user-defined): JSONB is the right tool, and a GIN index makes it fast.

The hybrid is usually best: promote the handful of fields you always filter and sort on into typed columns, and keep the long, variable tail in a jsonb column. That gives you relational integrity where it matters and document flexibility where it helps, instead of forcing everything into one model.

Takeaways

  • Prefer jsonb over json for nearly all uses: it parses once, reads fast, and supports indexing and rich operators.
  • JSONB excels for genuinely variable, external, or user-defined data, where a fixed schema is premature or impossible.
  • A GIN index makes containment (@>) and key (?) queries fast, but costs more on writes than a B-tree, so index because you query, not by reflex.
  • It bites when used for stable relational data: you lose constraints, type safety, clean indexing, and easy migrations, and updates rewrite whole documents.
  • The strong default is a hybrid model: typed columns for the fields you filter and sort on, a jsonb column for the variable tail.
6 min read

Read next