The MongoDB Anti-Patterns That Will Haunt You at Scale

MongoDB’s flexible schema is its greatest strength and its most dangerous feature. You can start building without thinking about data modeling, and it will work. Then at some point - usually when your collection crosses a few million documents - the decisions you made early come due.

The anti-patterns that cause the most pain at scale are not subtle. They are patterns that are actively encouraged by MongoDB’s flexibility, but that have serious performance and operational consequences when data volume grows.

The Unbounded Array

The most catastrophic MongoDB anti-pattern: storing an array in a document that grows without limit.

// This looks fine
{
  _id: "user_123",
  name: "Alice",
  posts: ["post_1", "post_2", "post_3"]
}

// This is catastrophic at scale
{
  _id: "user_123",
  name: "Alice",
  posts: ["post_1", "post_2", ... "post_50000"] // 50K posts
}

MongoDB’s maximum document size is 16 MB. An array of 50,000 object IDs (12 bytes each) is 600 KB. Not catastrophic by itself, but:

Loading any user document loads all 50,000 post IDs
Updating the array (adding a post) requires loading and rewriting the entire document
Indexing arrays creates index entries per element - 50,000 index entries per active user
WiredTiger compression cannot compress random object IDs effectively

The fix is to model this as a separate collection with a reference:

// posts collection
{
  _id: "post_1",
  user_id: "user_123",
  content: "...",
  created_at: ISODate("2025-01-01")
}
// Index on user_id + created_at for efficient querying

This is the relational model, and it is correct for unbounded relationships. MongoDB’s document model is not an excuse to ignore cardinality.

Missing or Wrong Indexes

MongoDB’s flexibility means you can query any field at any time. The performance implication: querying an unindexed field on a large collection does a full collection scan.

The classic mistake is building an application without thinking about indexes, then adding them reactively when queries become slow. By this point, the collection might have 50 million documents, and creating an index on a running production collection takes hours, during which read performance degrades.

Index design should happen before or alongside schema design, not after queries are slow.

Common missed indexes:

// You will query users by email
db.users.createIndex({ email: 1 }, { unique: true })

// You will query orders by status and sort by date
db.orders.createIndex({ status: 1, created_at: -1 })

// You will query events by user and time range
db.events.createIndex({ user_id: 1, timestamp: -1 })

// If you have text search needs
db.articles.createIndex({ title: "text", body: "text" })

Use explain("executionStats") to verify your queries are using indexes:

db.orders.find({ status: "pending" }).explain("executionStats")
// Look for: IXSCAN (good) vs COLLSCAN (bad)

Schema-less Does Not Mean Schema-free

“Schema-less” means MongoDB does not enforce a schema at the database level. It does not mean your application should work without a schema. Applications written without a defined schema produce collections where documents have inconsistent field names, inconsistent types, and inconsistent structure.

// What happens without schema discipline
{ user_id: "123", email: "[email protected]" }
{ userId: "124", Email: "[email protected]" }  // different field names
{ user_id: 125, email: "[email protected]" }   // integer vs string for user_id
{ user_id: "126" }                    // missing email

Querying this data is painful. Indexes cannot cover inconsistent field names. Type coercion produces surprising results.

The fix: use JSON Schema validation, which MongoDB supports:

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      required: ["user_id", "email"],
      properties: {
        user_id: { bsonType: "string" },
        email: { bsonType: "string" }
      }
    }
  }
})

Or use an ODM like Mongoose or Prisma that enforces schema at the application layer.

The N+1 Problem in Document Stores

The N+1 query problem from relational databases exists in MongoDB too. It just looks different.

// Fetching 100 posts then 100 separate queries for authors
const posts = await db.posts.find({}).limit(100).toArray();
for (const post of posts) {
  post.author = await db.users.findOne({ _id: post.author_id }); // 100 queries
}

The fix is the same as in relational databases: fetch the related data in one query using $lookup (equivalent to a JOIN) or by batching lookups:

// Batch lookup - one query for all authors
const authorIds = posts.map(p => p.author_id);
const authors = await db.users.find({ _id: { $in: authorIds } }).toArray();
const authorMap = Object.fromEntries(authors.map(a => [a._id, a]));
posts.forEach(p => { p.author = authorMap[p.author_id]; });

Using MongoDB for Everything

MongoDB is excellent for document-oriented data with variable structure. It is not the right tool for:

Highly relational data with complex joins across many collections
Strong transactional requirements across many documents (multi-document transactions exist but are expensive)
Analytical queries over large datasets (use a columnar store or data warehouse)
Time-series data at high ingestion rates (MongoDB has a time-series collection type, but dedicated time-series databases are better)

The anti-pattern is treating MongoDB as a universal database and modeling everything in it, including data that would be more efficiently stored and queried in a different system entirely.

Sharding Too Early (or Not Planning for It)

MongoDB sharding horizontally distributes data across multiple servers. Getting the shard key wrong is a production disaster that is very hard to fix after the fact.

Shard Key Choice	Problem
`user_id` (monotonically increasing)	All writes go to one shard (hotspot)
`created_at`	All writes go to the latest time range (hotspot)
A low-cardinality field (e.g., `status`)	Few distinct chunks, poor distribution
A random field (e.g., UUID)	Good distribution but range queries scatter across all shards

A good shard key has high cardinality, even write distribution, and ideally matches your most common query patterns.

Most teams do not need sharding until they reach hundreds of gigabytes or very high write throughput. Vertical scaling (larger machines) is often the right answer longer than people expect. When you do shard, choose the shard key carefully before you have data to migrate.

Bottom Line

MongoDB’s flexibility enables fast development and genuinely fits document-oriented data well. The anti-patterns - unbounded arrays, missing indexes, no schema discipline, N+1 queries, wrong shard keys - are all patterns that seem harmless at small scale and become severe at production scale. Model your data with cardinality in mind from the start, design indexes before you need them, enforce schema discipline at the application layer, and resist the temptation to use MongoDB for workloads that belong in different database systems - in many cases, Postgres with JSONB is the better choice.

The Unbounded Array#

Missing or Wrong Indexes#

Schema-less Does Not Mean Schema-free#

The N+1 Problem in Document Stores#

Using MongoDB for Everything#

Sharding Too Early (or Not Planning for It)#

Bottom Line#

Comments