For three years, OpenTelemetry was “almost ready.” The specification was stable, the SDKs were in beta, the collector was changing frequently, and production war stories were thin. Recommending it felt like recommending software that worked great in demos but needed another year of hardening.
That year has passed. OpenTelemetry is production-ready. More importantly, the problem it solves - observability vendor lock-in - is real and expensive.
The Problem OpenTelemetry Solves
Every observability vendor (Datadog, New Relic, Honeycomb, Dynatrace) has proprietary agents. You instrument your application with their SDK, your data flows to their backend, and migrating to a competitor means re-instrumenting your entire codebase.
OpenTelemetry is the W3C/CNCF standard for producing telemetry data (traces, metrics, logs). You instrument once using the OpenTelemetry SDK, route data through the OpenTelemetry Collector, and send it to any backend that speaks OTLP (the OpenTelemetry Line Protocol). Every major observability vendor now accepts OTLP.
This means you can switch from Datadog to Honeycomb by changing a configuration line in the Collector, not by re-instrumenting your application.
The State of the Specs in 2025
| Signal | Spec Status | SDK Status |
|---|---|---|
| Traces | Stable (since 2021) | Stable in Go, Java, Python, JS, .NET, Ruby |
| Metrics | Stable (since 2022) | Stable in Go, Java, Python, JS |
| Logs | Stable (since 2023) | Stable in Java, Go; RC in Python, JS |
| Profiles | Beta | Beta in Go, Java |
All three primary signals are now stable. The main work remaining is SDK completeness in less common languages and improving the local development experience.
Where to Start
Step 1: Automatic Instrumentation for Traces
Start with zero-code auto-instrumentation. Most frameworks have plugins that add tracing without modifying your application code.
For Node.js:
npm install @opentelemetry/auto-instrumentations-node
// tracing.js - required before your application code
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: 'http://localhost:4318/v1/traces',
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
node -r ./tracing.js app.js
This automatically traces Express routes, database calls (pg, mysql2, mongoose), Redis operations, HTTP outbound requests, and AWS SDK calls. No application code changes.
Step 2: Deploy the OpenTelemetry Collector
The Collector is the routing layer. It receives telemetry from your applications and forwards it to backends.
A minimal otel-collector-config.yaml:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
exporters:
otlp:
endpoint: "your-backend.example.com:4317"
logging:
verbosity: normal
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp, logging]
Run it in Docker:
docker run -p 4317:4317 -p 4318:4318 \
-v ./otel-collector-config.yaml:/etc/otel-collector-config.yaml \
otel/opentelemetry-collector-contrib \
--config=/etc/otel-collector-config.yaml
Step 3: Add Custom Spans for Business Logic
Auto-instrumentation covers infrastructure calls. For business logic, add manual spans:
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('your-service');
async function processOrder(orderId) {
const span = tracer.startSpan('processOrder');
span.setAttribute('order.id', orderId);
try {
const order = await fetchOrder(orderId); // auto-traced
span.setAttribute('order.total', order.total);
await chargePayment(order); // auto-traced
await sendConfirmation(order.email);
span.setStatus({ code: SpanStatusCode.OK });
} catch (err) {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR });
throw err;
} finally {
span.end();
}
}
The Backend Options
Once you are sending OTLP, your options for receiving it are excellent:
| Backend | Type | Cost model |
|---|---|---|
| Jaeger | Self-hosted | Free |
| Grafana Tempo | Self-hosted or managed | Free / usage-based |
| Honeycomb | Managed | Usage-based |
| Datadog | Managed | Seat + usage |
| Lightstep | Managed | Usage-based |
| Signoz | Self-hosted | Free |
For teams starting out, Grafana’s stack (Tempo for traces, Mimir for metrics, Loki for logs) is free, well-documented, and runs well on Kubernetes.
The Sampling Question
At scale, tracing every request is expensive. OpenTelemetry’s Collector supports tail-based sampling:
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow-requests
type: latency
latency: {threshold_ms: 1000}
- name: baseline
type: probabilistic
probabilistic: {sampling_percentage: 5}
This samples all errors, all slow requests, and 5% of everything else. You keep full visibility into problems while controlling costs.
Bottom Line
OpenTelemetry solves a real problem - vendor lock-in in observability tooling is expensive when you want to switch or use multiple backends. The implementation path is now practical: start with auto-instrumentation to get immediate visibility, add the Collector for routing, add custom spans for business context.
Start with traces. Metrics and logs can follow once you have distributed tracing in place. The instrumentation investment is a one-time cost that gives you permanent flexibility in where your data goes.
Comments