How to Read a Codebase You Have Never Seen Before

Joining a new codebase is one of the most disorienting experiences in software engineering. Especially when it is large, old, and underdocumented. Most engineers either spend weeks feeling lost or jump straight to making changes before they understand what they are touching.

Both approaches slow you down. Here is a better method.

The Wrong Way Most Engineers Do It

The instinct is to open the repo and start reading code top to bottom, or to search for the specific thing you need to change and tunnel-vision on it. Neither builds the mental model you actually need.

Reading code without understanding the domain and architecture is like reading a book by jumping between random pages. You collect words but not meaning.

Start With the Domain, Not the Code

Before you read a single line of code, understand what the software does from a user perspective.

Read the README end to end (yes, even if it is outdated - outdated READMEs tell you what the project used to be)
Look at the product, demo, or staging environment if one exists
Read recent pull request descriptions to understand what problems the team is solving right now
Look at the issue tracker for context on current priorities and known problems

This gives you vocabulary. When you encounter a class named OrderFulfillmentPipeline, you will have context for what “fulfillment” means in this system.

The Architecture Map

Before reading individual files, map the high-level structure.

Step 1: Identify the top-level modules or services. What are the major directories or packages? What does each one handle?

Step 2: Find the entry points. Where does execution start? For a web app, find the main router. For a service, find the main function. For a library, find the public API.

Step 3: Trace one user flow end to end. Pick the simplest, most fundamental thing the application does and follow it from input to output. For an API, that might be the health check endpoint or a simple GET. Trace the request through every layer: routing, middleware, controller, service, database query, and back.

This single exercise teaches you more than hours of random code reading.

The Questions That Unlock Understanding

As you read, ask these questions:

What data flows through this system? What are the core entities?
Where does data come in? (HTTP, message queue, file, database)
Where does data go out? (Response, another service, storage)
What are the error paths? How does the system handle failures?
Where is business logic? Is it in services, models, or scattered?

Sketch rough diagrams even if just on paper. The act of drawing forces you to make your mental model explicit.

The Tools That Make This Faster

Tool	What It Helps With
IDE “Find usages”	See all callers of a function, trace execution paths
Git log on a file	See how a file has changed over time and why
Git blame	See who wrote what and when - crucial for legacy code context
Grep for class/function names	Find all usages across the codebase
Dependency graphs (for large repos)	Visual overview of module dependencies

Git blame is underused. When you see confusing code, check who wrote it and read the commit message. Often the “why” is right there. If it is not, you now know who to ask.

The Shortcut: Follow the Tests

Tests, when they exist, are documentation. They show you:

What the intended behavior of a module is
The input shapes the system expects
The edge cases the team was thinking about

Reading tests before reading implementation often gives you a clearer picture of what code is supposed to do than reading the code itself.

How to Deal With Legacy Code

Legacy codebases have extra challenges: no tests, no documentation, multiple overlapping architectural patterns from different eras of the codebase.

Approach:

Resist the urge to judge the code. Assume there were reasons.
Make your first change small and in a well-understood area.
Before touching any module, write a test that captures current behavior. This is your safety net.
When you find something confusing, add a comment explaining what you figured out. You are the first person to understand this in a while - document it for the next person.

Building Your Mental Model Over Time

Full understanding of a large codebase takes months. Give yourself permission to work at different levels of resolution:

High resolution: the areas you actively work in
Medium resolution: areas adjacent to yours
Low resolution: other services or modules you rarely touch

You do not need to understand everything immediately. You need to understand enough to do your work without introducing bugs - and enough to know where to look when something breaks.

The Note-Taking System That Helps

Keep a running document as you explore. For each major module or subsystem you investigate, write:

What it does
Its entry points
How it connects to other systems
Any gotchas, surprises, or things that confused you initially

This document becomes invaluable when you return to a part of the codebase you have not touched in months. It also helps when onboarding the next new person.

Bottom Line

Reading an unfamiliar codebase is a skill that gets better with practice. Start with the domain and architecture before diving into code. Trace one user flow end to end. Use git blame to understand why code exists. Follow the tests. Take notes. You will never have perfect understanding, but systematic exploration gets you to functional understanding much faster than hoping clarity arrives on its own.

The Wrong Way Most Engineers Do It#

Start With the Domain, Not the Code#

The Architecture Map#

The Questions That Unlock Understanding#

The Tools That Make This Faster#

The Shortcut: Follow the Tests#

How to Deal With Legacy Code#

Building Your Mental Model Over Time#

The Note-Taking System That Helps#

Bottom Line#

Comments