Why Your Docker Images Are 10x Larger Than They Need to Be

Pull a random Node.js Dockerfile from GitHub and there is a reasonable chance it produces a 1.2 GB image. The API it serves handles simple CRUD requests. The actual compiled application logic is maybe 50 MB. The rest is node_modules, build tools, OS packages, and development dependencies that have no business being in production.

This is not a minor inefficiency. Large images slow down CI/CD pipelines, increase registry storage costs, slow deployments, and increase the attack surface of your containers. A 100 MB image deploys in 15 seconds. A 1.2 GB image takes 2-3 minutes. Over hundreds of deployments, that’s days of wasted time.

Why Images Bloat

The most common causes, in order of frequency:

1. Starting from a full OS image

# This is a 900MB+ starting point
FROM node:18

The node:18 image is based on Debian and includes a complete OS with compilers, headers, package managers, and utilities you will never use in production. The Alpine equivalent is 50 MB.

2. Installing dev dependencies in production

COPY package*.json ./
RUN npm install  # installs ALL dependencies including devDependencies

npm install with no flags installs everything in package.json, including your test frameworks, type definitions, build tools, and linters. A well-organized project might have 200 MB of prod dependencies and 600 MB of dev dependencies.

3. Not using .dockerignore

Without a .dockerignore, COPY . . copies your node_modules/, .git/, test fixtures, local env files, and everything else in your project directory into the image. The node_modules/ folder alone is often 300-500 MB.

4. Layer bloat from not chaining commands

# Each RUN creates a new layer - the files from apt-get exist forever even after rm
RUN apt-get update
RUN apt-get install -y git
RUN rm -rf /var/lib/apt/lists/*

Even though the final rm removes the apt cache, it creates a new layer. The data from the previous layers still exists in the image. You must chain these commands.

5. No multi-stage builds

Building a compiled artifact (TypeScript to JavaScript, Go binary, Java JAR) requires build tools that aren’t needed to run the result. Without multi-stage builds, build tools end up in your production image.

The Fixes

Use Alpine or Distroless Base Images

Base Image	Size
`node:18`	~950 MB
`node:18-slim`	~240 MB
`node:18-alpine`	~55 MB
`gcr.io/distroless/nodejs18`	~115 MB

Start here. node:18-alpine alone reduces your base from 950 MB to 55 MB.

Distroless images (from Google) go further - they contain only the runtime, no shell, no package manager. Harder to debug but maximally minimal and secure.

Multi-Stage Builds

The right pattern for a TypeScript application:

# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:18-alpine AS production
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]

What this does:

Stage 1 installs everything and builds the TypeScript
Stage 2 starts fresh with only production dependencies
The build tools, TypeScript source, and dev dependencies never make it into the final image

For Go, the improvement is even more dramatic:

# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main ./cmd/server

# Stage 2: Production
FROM scratch
COPY --from=builder /app/main /main
EXPOSE 8080
CMD ["/main"]

FROM scratch is literally an empty image. A Go binary compiled with CGO_ENABLED=0 is fully self-contained. The resulting image is the size of the binary - typically 5-15 MB.

.dockerignore Is Non-Negotiable

node_modules/
.git/
.gitignore
*.md
*.log
.env
.env.*
dist/
build/
coverage/
.nyc_output/
test/
tests/
__tests__/
*.test.ts
*.spec.ts
.eslintrc*
.prettierrc*
jest.config.*
tsconfig*.json

This prevents COPY . . from pulling in hundreds of MB of files that don’t belong in the image.

Chain Your RUN Commands

# Wrong
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# Right
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

The --no-install-recommends flag prevents apt from pulling in suggested packages you didn’t ask for.

Size Targets by Application Type

Application Type	Reasonable Production Size
Simple Node.js API	80-150 MB
Python Flask/FastAPI app	100-200 MB
Go HTTP server	10-25 MB
Java Spring Boot	150-300 MB
Static site (nginx)	25-50 MB

If your image is more than 3x these numbers, something is wrong.

Audit Your Existing Images

# See layer sizes
docker history my-image:latest

# Dive tool - interactive layer explorer
docker run --rm -it wagoodman/dive my-image:latest

dive is the single most useful tool for understanding why an image is large. It shows you what each layer adds, which files are duplicated, and what percentage of the image is “wasted” by layer artifacts.

Bottom Line

A 100 MB production image versus a 1.2 GB production image is a 12x improvement in deploy time, a 12x reduction in registry bandwidth, and a meaningfully smaller attack surface. The techniques required - multi-stage builds, Alpine base images, proper .dockerignore, chained RUN commands - take about 30 minutes to implement once and benefit every deployment forever. There is no good reason not to do this.

Why Images Bloat#

The Fixes#

Use Alpine or Distroless Base Images#

Multi-Stage Builds#

.dockerignore Is Non-Negotiable#

Chain Your RUN Commands#

Size Targets by Application Type#

Audit Your Existing Images#

Bottom Line#

Comments