OpenTelemetry Is Finally Ready for Production - Here Is Where to Start

For three years, OpenTelemetry was “almost ready.” The specification was stable, the SDKs were in beta, the collector was changing frequently, and production war stories were thin. Recommending it felt like recommending software that worked great in demos but needed another year of hardening.

That year has passed. OpenTelemetry is production-ready. More importantly, the problem it solves - observability vendor lock-in - is real and expensive.

The Problem OpenTelemetry Solves

Every observability vendor (Datadog, New Relic, Honeycomb, Dynatrace) has proprietary agents. You instrument your application with their SDK, your data flows to their backend, and migrating to a competitor means re-instrumenting your entire codebase.

OpenTelemetry is the W3C/CNCF standard for producing telemetry data (traces, metrics, logs). You instrument once using the OpenTelemetry SDK, route data through the OpenTelemetry Collector, and send it to any backend that speaks OTLP (the OpenTelemetry Line Protocol). Every major observability vendor now accepts OTLP.

This means you can switch from Datadog to Honeycomb by changing a configuration line in the Collector, not by re-instrumenting your application.

The State of the Specs in 2025

Signal	Spec Status	SDK Status
Traces	Stable (since 2021)	Stable in Go, Java, Python, JS, .NET, Ruby
Metrics	Stable (since 2022)	Stable in Go, Java, Python, JS
Logs	Stable (since 2023)	Stable in Java, Go; RC in Python, JS
Profiles	Beta	Beta in Go, Java

All three primary signals are now stable. The main work remaining is SDK completeness in less common languages and improving the local development experience.

Where to Start

Step 1: Automatic Instrumentation for Traces

Start with zero-code auto-instrumentation. Most frameworks have plugins that add tracing without modifying your application code.

For Node.js:

npm install @opentelemetry/auto-instrumentations-node

// tracing.js - required before your application code
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

node -r ./tracing.js app.js

This automatically traces Express routes, database calls (pg, mysql2, mongoose), Redis operations, HTTP outbound requests, and AWS SDK calls. No application code changes.

Step 2: Deploy the OpenTelemetry Collector

The Collector is the routing layer. It receives telemetry from your applications and forwards it to backends.

A minimal otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s

exporters:
  otlp:
    endpoint: "your-backend.example.com:4317"
  logging:
    verbosity: normal

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, logging]

Run it in Docker:

docker run -p 4317:4317 -p 4318:4318 \
  -v ./otel-collector-config.yaml:/etc/otel-collector-config.yaml \
  otel/opentelemetry-collector-contrib \
  --config=/etc/otel-collector-config.yaml

Step 3: Add Custom Spans for Business Logic

Auto-instrumentation covers infrastructure calls. For business logic, add manual spans:

const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('your-service');

async function processOrder(orderId) {
  const span = tracer.startSpan('processOrder');
  span.setAttribute('order.id', orderId);

  try {
    const order = await fetchOrder(orderId); // auto-traced
    span.setAttribute('order.total', order.total);

    await chargePayment(order); // auto-traced
    await sendConfirmation(order.email);

    span.setStatus({ code: SpanStatusCode.OK });
  } catch (err) {
    span.recordException(err);
    span.setStatus({ code: SpanStatusCode.ERROR });
    throw err;
  } finally {
    span.end();
  }
}

The Backend Options

Once you are sending OTLP, your options for receiving it are excellent:

Backend	Type	Cost model
Jaeger	Self-hosted	Free
Grafana Tempo	Self-hosted or managed	Free / usage-based
Honeycomb	Managed	Usage-based
Datadog	Managed	Seat + usage
Lightstep	Managed	Usage-based
Signoz	Self-hosted	Free

For teams starting out, Grafana’s stack (Tempo for traces, Mimir for metrics, Loki for logs) is free, well-documented, and runs well on Kubernetes.

The Sampling Question

At scale, tracing every request is expensive. OpenTelemetry’s Collector supports tail-based sampling:

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-requests
        type: latency
        latency: {threshold_ms: 1000}
      - name: baseline
        type: probabilistic
        probabilistic: {sampling_percentage: 5}

This samples all errors, all slow requests, and 5% of everything else. You keep full visibility into problems while controlling costs.

Bottom Line

OpenTelemetry solves a real problem - vendor lock-in in observability tooling is expensive when you want to switch or use multiple backends. The implementation path is now practical: start with auto-instrumentation to get immediate visibility, add the Collector for routing, add custom spans for business context.

Start with traces. Metrics and logs can follow once you have distributed tracing in place. The instrumentation investment is a one-time cost that gives you permanent flexibility in where your data goes.

The Problem OpenTelemetry Solves#

The State of the Specs in 2025#

Where to Start#

Step 1: Automatic Instrumentation for Traces#

Step 2: Deploy the OpenTelemetry Collector#

Step 3: Add Custom Spans for Business Logic#

The Backend Options#

The Sampling Question#

Bottom Line#

Comments