OpenTelemetry (OTel) is an open source project that provides a vendor-neutral standard for collecting, processing, and exporting telemetry data from distributed systems (such as a microservices architecture). This simplified and universal approach to observability makes it easier for developers to analyze software’s performance and behavior so they can more easily diagnose and debug issues in their applications. OTel collects the following data:

  • Traces – “Where is the problem?”
  • Metrics – “Is there a problem?”
  • Logs – “What is the problem?”

OTel is not a programming language or product. This open source project has been around since 2019 and is currently maintained by the Cloud Native Computing Foundation (CNCF).

Watch this video to see how to get started:

Types of Data Generated by OTel

Traces

A trace records the events that happen during an operation such as the handling of a single request. The trace is divided into a series of spans, each of them representing a unit of work.

For example, the trace for a web request might include three spans:

  • Accepting the request
  • Querying the database
  • Sending a response

A trace slices up a data flow which may include multiple services into a series of chronologically ordered chunks to help you easily understand:

  • All the steps that happened in each chunk
  • The order in which the chunks executed
  • How long each step lasted
  • Metadata about each step

Once OTel has generated traces, the next step is to export them into a tracing backend or tool for analysis. OTel provides a set of exporters for popular backends such as JaegerZipkin, and AWS X‑Ray. These services provide tools for analyzing and visualizing trace data.

Metrics

In OTel, metrics are measurements of specific aspects of an operating system’s behavior and are collected over time as key‑value pairs (known as metric labels). The key‑value pairs provide context about the measurement over time. For example, a metric for the response time of a web service might include labels for the HTTP status code, the endpoint, and the HTTP method. All metrics also are timestamped, again to enable chronological ordering.

Logs

Logs are the oldest and most common method for getting insight into what is going on with a given service. They are generally produced as text and must be parsed to generate insights. Support for logs in OTel is still experimental.

To learn more about what our solution architects discovered when they compared the observability feature sets in OTel against other observability tools, see Integrating OpenTelemetry into the Modern Apps Reference Architecture – A Progress Report on our blog.

OTel Instrumentation

OTel integrates with many popular programming languages, libraries, and frameworks. Support in some languages is more comprehensive than others. For example, the JavaScript instrumentation libraries have self‑described “stable” implementations for both tracing and metrics and some of the most stable support for logs. They also provide an auto‑instrumentation option that lets you start receiving traces without adding any instrumentation‑specific code to your service logic. On the other hand, languages like Go have less mature support for metrics and logs and lack auto‑instrumentation features.

Telemetry Goals

When setting up telemetry instrumentation, it’s best to start with a set of goals for instrumentation more defined than “send everything and hope for insights”. While it is true that you can’t know the full extent of what’s possible until you view the data, setting some minimal requirements helps ensure the smooth operation and maintenance of your services.

These can be technical concerns like:

  • I want to know when my service is under pressure and needs scaling.
  • I want to know if my service is restarting often.

But they can also be product and user experience‑related concerns like:

  • I want users to see new messages in the system within five seconds.
  • I want notifications to be dispatched within one minute of a message being sent.

As an example from our tutorial How to Use OpenTelemetry Tracing to Understand Your Microservices, you might define the following as the key goals:

  • Understand all the steps a request takes to accomplish the new message flow.
  • Check that the user flow completed successfully.
  • Have confidence that the user flow is executing faster than five seconds from end to end (under “normal” circumstances).
  • Learn whether the notifier service is processing the event (dispatched by the messenger service) in a timely manner. 
OTel Implementation

OTel provides developers with a single set of application programming interfaces (APIs), software development kits (SDKs), and instrumentation libraries they can use to instrument their applications in a consistent and standardized way.

Because the format of the data produced by OTel is considered an industry standard, multiple telemetry aggregation and visualization solutions accept it. You can choose an on‑premises solution, like Jaeger (as we did in this tutorial), or opt for a Software-as-a-Service (SaaS) solution, like SumoLogic or SigNoz.

To manage all three types of telemetry, the only alternative to OTel is a combination of multiple tools. This adds even more complexity on top of the inherent complexity involved with running a microservices architecture and infrastructure.

What Is an API in the Context of OTel?

APIs define the methods, functions, and protocols used by software components to interact with each other. The OTel APIs define a standard set of methods and protocols that developers can use to instrument their applications and collect telemetry data.

What Is an SDK in the Context of OTel?

SDKs are software development tools provided by the author of a standard or application that make it easier for developers to build applications that conform to the standard or interact with the app. SDKS typically include libraries, code samples, documentation, and tools for testing, debugging, and performance tuning. OTel provides SDKs for tracingmetrics, and resource management.