In early 2023, LangChain had a GitHub star velocity that was almost unprecedented for a developer tools project. Engineers building LLM applications reached for it by default - it had chains, agents, memory, vector store integrations, and enough examples to get started in an afternoon.
By 2025, the post-mortems are in. Most teams that shipped production LLM applications with LangChain eventually rewrote the core without it. The most common explanation: “we spent more time fighting the abstractions than building the product.”
What LangChain Got Right
LangChain’s early success was not accidental. In mid-2022, there was no established pattern for building LLM applications. LangChain provided:
- A unified interface for multiple LLM providers
- Pre-built chains for common patterns (Q&A, summarization, extraction)
- Memory implementations for conversation history
- Vector store integrations with a common interface
- Agent frameworks for tool-using LLMs
For prototyping and exploration, this was genuinely valuable. You could build a functional RAG pipeline in 50 lines.
Where It Fell Apart in Production
Abstraction Leakage
Every LangChain abstraction has configuration options for the underlying implementation detail it is trying to hide. When you hit a problem, you need to understand both the LangChain abstraction and the underlying system.
Debugging a retrieval chain meant understanding how LangChain’s RetrievalQA chain constructed its prompt, what format it expected from the vector store, how it merged retrieved context, and what the final prompt looked like. None of this was easily inspectable.
A common complaint: “I just want to see what prompt is being sent to the model.” Getting that required either monkey-patching, enabling verbose mode, or reading the source code of three nested classes.
API Instability
LangChain released LCEL (LangChain Expression Language) as a replacement for the original chain API. Teams that had built on the original API faced migration costs. The framework’s rapid iteration, while impressive, created breaking changes every few months for production users.
Over-Engineering Simple Things
The LangChain way of calling an LLM and parsing its output:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}")
])
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()
chain = prompt | model | parser
result = chain.invoke({"input": "What is 2+2?"})
The direct API approach:
from openai import OpenAI
client = OpenAI()
result = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
]
)
print(result.choices[0].message.content)
For applications where you control the prompt and the model, LangChain adds a layer with no compensating benefit.
What Replaced It
Direct API with Structured Outputs
For most use cases, the OpenAI SDK (or Anthropic SDK) directly handles everything LangChain’s model interfaces provided:
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class ExtractedData(BaseModel):
company: str
revenue: float
year: int
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[{"role": "user", "content": document_text}],
response_format=ExtractedData,
)
extracted = response.choices[0].message.parsed
The .parse() method with Pydantic models handles structured output extraction without a framework. This is now the recommended pattern for data extraction tasks.
LlamaIndex for RAG
LlamaIndex (formerly GPT Index) focused its scope on the retrieval and knowledge base problem. It has better support for:
- Complex document ingestion pipelines
- Multiple retrieval strategies (hybrid, hierarchical, summary indices)
- Knowledge graph construction
- Evaluation tools for retrieval quality
Teams building serious RAG systems have largely moved to LlamaIndex over LangChain because LlamaIndex’s abstractions are better aligned with the retrieval problem.
Instructor for Structured Extraction
The instructor library wraps the OpenAI SDK to make structured extraction with Pydantic models ergonomic and reliable:
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class ContactInfo(BaseModel):
name: str
email: str
company: str
contact = client.chat.completions.create(
model="gpt-4o",
response_model=ContactInfo,
messages=[{"role": "user", "content": email_text}],
)
Instructor handles retry logic, validation, and partial parsing. It solves one problem well.
DSPy for Systematic Prompt Engineering
DSPy (from Stanford) takes a fundamentally different approach: instead of writing prompts manually, you write program logic and DSPy optimizes the prompts to achieve desired outcomes on example inputs:
import dspy
class RAGPipeline(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.generate(context=context, question=question)
DSPy then optimizes the prompts in ChainOfThought automatically using few-shot examples. This is a more principled approach than hand-writing prompts for complex pipelines.
Pydantic AI for Agents
For agent systems (LLMs that use tools), Pydantic AI offers a cleaner model than LangChain’s agent classes:
from pydantic_ai import Agent
agent = Agent(
'openai:gpt-4o',
system_prompt='You are a helpful assistant with tool access.',
)
@agent.tool_plain
def search_database(query: str) -> str:
return db.search(query)
result = await agent.run("Find orders over $500 from last month")
The tool definition is type-safe, the agent loop is inspectable, and the abstraction level is appropriate.
The Pattern That Emerged
Production AI applications in 2025 are mostly built with:
- Direct SDK calls for simple inference
- Instructor or Pydantic AI for structured output and tool use
- LlamaIndex for RAG-heavy applications
- Custom prompt templates managed in version control, not framework classes
The pattern is: use a framework only for the specific problem it solves well, and use direct API calls for everything else.
Bottom Line
LangChain’s decline was not about technical inferiority - it was about the natural maturation of an ecosystem that started under-equipped. When LangChain launched, there were no good patterns for LLM applications. It provided patterns. Now better patterns exist, and they are simpler.
The replacement is not a single framework. It is smaller, focused libraries that each do one thing well: LlamaIndex for retrieval, Instructor for extraction, Pydantic AI for agents. This is a healthier ecosystem than a single monolithic framework, even if it requires more integration work.
Comments