In early 2023, LangChain had a GitHub star velocity that was almost unprecedented for a developer tools project. Engineers building LLM applications reached for it by default - it had chains, agents, memory, vector store integrations, and enough examples to get started in an afternoon.

By 2025, the post-mortems are in. Most teams that shipped production LLM applications with LangChain eventually rewrote the core without it. The most common explanation: “we spent more time fighting the abstractions than building the product.”

What LangChain Got Right

LangChain’s early success was not accidental. In mid-2022, there was no established pattern for building LLM applications. LangChain provided:

  • A unified interface for multiple LLM providers
  • Pre-built chains for common patterns (Q&A, summarization, extraction)
  • Memory implementations for conversation history
  • Vector store integrations with a common interface
  • Agent frameworks for tool-using LLMs

For prototyping and exploration, this was genuinely valuable. You could build a functional RAG pipeline in 50 lines.

Where It Fell Apart in Production

Abstraction Leakage

Every LangChain abstraction has configuration options for the underlying implementation detail it is trying to hide. When you hit a problem, you need to understand both the LangChain abstraction and the underlying system.

Debugging a retrieval chain meant understanding how LangChain’s RetrievalQA chain constructed its prompt, what format it expected from the vector store, how it merged retrieved context, and what the final prompt looked like. None of this was easily inspectable.

A common complaint: “I just want to see what prompt is being sent to the model.” Getting that required either monkey-patching, enabling verbose mode, or reading the source code of three nested classes.

API Instability

LangChain released LCEL (LangChain Expression Language) as a replacement for the original chain API. Teams that had built on the original API faced migration costs. The framework’s rapid iteration, while impressive, created breaking changes every few months for production users.

Over-Engineering Simple Things

The LangChain way of calling an LLM and parsing its output:

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{input}")
])
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()
chain = prompt | model | parser
result = chain.invoke({"input": "What is 2+2?"})

The direct API approach:

from openai import OpenAI
client = OpenAI()
result = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ]
)
print(result.choices[0].message.content)

For applications where you control the prompt and the model, LangChain adds a layer with no compensating benefit.

What Replaced It

Direct API with Structured Outputs

For most use cases, the OpenAI SDK (or Anthropic SDK) directly handles everything LangChain’s model interfaces provided:

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class ExtractedData(BaseModel):
    company: str
    revenue: float
    year: int

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{"role": "user", "content": document_text}],
    response_format=ExtractedData,
)
extracted = response.choices[0].message.parsed

The .parse() method with Pydantic models handles structured output extraction without a framework. This is now the recommended pattern for data extraction tasks.

LlamaIndex for RAG

LlamaIndex (formerly GPT Index) focused its scope on the retrieval and knowledge base problem. It has better support for:

  • Complex document ingestion pipelines
  • Multiple retrieval strategies (hybrid, hierarchical, summary indices)
  • Knowledge graph construction
  • Evaluation tools for retrieval quality

Teams building serious RAG systems have largely moved to LlamaIndex over LangChain because LlamaIndex’s abstractions are better aligned with the retrieval problem.

Instructor for Structured Extraction

The instructor library wraps the OpenAI SDK to make structured extraction with Pydantic models ergonomic and reliable:

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class ContactInfo(BaseModel):
    name: str
    email: str
    company: str

contact = client.chat.completions.create(
    model="gpt-4o",
    response_model=ContactInfo,
    messages=[{"role": "user", "content": email_text}],
)

Instructor handles retry logic, validation, and partial parsing. It solves one problem well.

DSPy for Systematic Prompt Engineering

DSPy (from Stanford) takes a fundamentally different approach: instead of writing prompts manually, you write program logic and DSPy optimizes the prompts to achieve desired outcomes on example inputs:

import dspy

class RAGPipeline(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

DSPy then optimizes the prompts in ChainOfThought automatically using few-shot examples. This is a more principled approach than hand-writing prompts for complex pipelines.

Pydantic AI for Agents

For agent systems (LLMs that use tools), Pydantic AI offers a cleaner model than LangChain’s agent classes:

from pydantic_ai import Agent

agent = Agent(
    'openai:gpt-4o',
    system_prompt='You are a helpful assistant with tool access.',
)

@agent.tool_plain
def search_database(query: str) -> str:
    return db.search(query)

result = await agent.run("Find orders over $500 from last month")

The tool definition is type-safe, the agent loop is inspectable, and the abstraction level is appropriate.

The Pattern That Emerged

Production AI applications in 2025 are mostly built with:

  1. Direct SDK calls for simple inference
  2. Instructor or Pydantic AI for structured output and tool use
  3. LlamaIndex for RAG-heavy applications
  4. Custom prompt templates managed in version control, not framework classes

The pattern is: use a framework only for the specific problem it solves well, and use direct API calls for everything else.

Bottom Line

LangChain’s decline was not about technical inferiority - it was about the natural maturation of an ecosystem that started under-equipped. When LangChain launched, there were no good patterns for LLM applications. It provided patterns. Now better patterns exist, and they are simpler.

The replacement is not a single framework. It is smaller, focused libraries that each do one thing well: LlamaIndex for retrieval, Instructor for extraction, Pydantic AI for agents. This is a healthier ecosystem than a single monolithic framework, even if it requires more integration work.