The Auth Bug That Took Down Our Entire Weekend

It started, as these things do, with a support ticket that didn't make sense. A customer reported seeing an invoice that wasn't theirs. By the time the second ticket landed, we knew it wasn't a fluke. Someone had incremented an ID in a URL and walked straight into another tenant's billing history. That single missing check ate our entire weekend.

This is the post-mortem I wish I'd read a year earlier. The bug was not exotic. It was broken access control — the category that sits at the very top of the OWASP Top 10. It is the most common serious flaw in modern web apps, and it almost never shows up as a crash or a stack trace. The code works perfectly; it just works for the wrong person.

The Bug in One Screenshot

Our invoice endpoint looked reasonable. It authenticated the request, loaded the record, and returned it. The problem is the word "reasonable" was doing a lot of hiding.

// What we shipped — authenticated, but not authorized
app.get('/api/invoices/:id', requireLogin, async (req, res) => {
  const invoice = await db.invoices.findById(req.params.id);
  if (!invoice) return res.status(404).end();
  res.json(invoice); // returns ANY invoice to ANY logged-in user
});

requireLogin confirmed who you were. Nothing confirmed what you were allowed to see. Authentication answers "are you logged in." Authorization answers "are you allowed to touch this object." We had conflated the two, and the gap between them was a direct object reference an attacker could iterate by hand. This specific shape — using a user-supplied ID to fetch a record without an ownership check — is what PortSwigger documents as Insecure Direct Object References (IDOR), a subclass of broken access control.

Why Our Tests Were Green

Every test passed. That's the unsettling part. Our suite created a user, logged them in, requested their own invoice, and asserted a 200. It never created a second user and tried to read the first user's data. We tested the happy path of ownership and called it coverage.

Access control bugs are invisible to tests that only ever act as the resource owner. The negative case — a different, authenticated user being correctly denied — is the test that actually proves the control exists.

// The test we should have had from day one
test('user B cannot read user A invoice', async () => {
  const a = await createUser();
  const b = await createUser();
  const invoice = await createInvoice({ ownerId: a.id });
 
  const res = await request(app)
    .get(`/api/invoices/${invoice.id}`)
    .set('Authorization', bearer(b.token));
 
  expect(res.status).toBe(404); // not 200, not 403-with-body
});

"Authenticated" is not "authorized." A logged-in user is still an attacker for every object they don't own. Every endpoint that loads a record by a client-supplied identifier must verify ownership or role on the server, on every request — never trust the client to scope the query for you.

The Timeline

Here is roughly how the weekend went, because the response matters as much as the bug.

Saturday 09:14 — Second ticket arrives. We connect the dots and realize it's cross-tenant data exposure.
09:40 — We pull access logs and grep for sequential ID access patterns to estimate blast radius.
11:20 — Hotfix deployed: an ownership check added to the offending handler.
14:00 — Audit of every sibling endpoint, because if one was wrong, others probably were too. Three more had the same gap.
Sunday — Notifications, log review, and writing the policy that became this article.

# Rough triage: which records did a given session touch?
# (illustrative — real analysis used our structured audit log)
grep "GET /api/invoices/" access.log \
  | awk '{print $1, $7}' \
  | sort | uniq -c | sort -rn | head

The investigation took longer than the fix. That's normal. The expensive part of a broken-access-control incident is proving what didn't get accessed.

How We Closed the Whole Class

A one-line hotfix stops one bug. We wanted to retire the category. Three changes did most of the work.

Scope every query by the actor

The cleanest fix makes unauthorized access structurally impossible: filter by the owner in the query itself, so "not found" and "not yours" collapse into the same 404.

app.get('/api/invoices/:id', requireLogin, async (req, res) => {
  const invoice = await db.invoices.findOne({
    id: req.params.id,
    ownerId: req.user.id, // ownership is part of the lookup
  });
  if (!invoice) return res.status(404).end();
  res.json(invoice);
});

Deny by default

We moved to a model where access is denied unless a rule explicitly grants it, following the guidance in the OWASP Authorization Cheat Sheet. New endpoints start locked. You have to opt a role in, which means a forgotten check fails closed instead of open.

Centralize the decision

Scattered if (user.role === 'admin') checks rot. We pulled authorization into a single policy layer that every handler calls, so the rule lives in one auditable place instead of being copy-pasted with subtle variations.

What Actually Caused It

The root cause wasn't a careless engineer. It was a system that made the insecure version the easy version. The framework gave us requireLogin out of the box and nothing equivalent for object-level authorization, so the path of least resistance skipped it. When the secure thing is extra work, it gets skipped under deadline pressure — every time.

The fix that lasted wasn't the code. It was making ownership-scoped queries the default pattern in our codebase, adding the negative test to our template, and putting "can a different user reach this?" on the PR checklist. Culture and defaults prevent the next one; the hotfix only addressed the one we already had.

Takeaways

Broken access control is the #1 risk in the OWASP Top 10, and it passes every test that only acts as the resource owner.
Authentication ("who are you") is not authorization ("what may you touch"); enforce both, and enforce authorization on the server for every object.
Scope database queries by the acting user so unauthorized records are structurally unreachable, not just hidden by a check that can be forgotten.
Write the negative test: a second authenticated user must be denied. That test is the only proof the control exists.
Deny by default and centralize authorization so a forgotten rule fails closed, not open.
After an incident, fix the default that produced the bug — not just the single endpoint that got reported.