跳到主要内容

JEP 349: JFR Event Streaming

Summary

Expose JDK Flight Recorder data for continuous monitoring.

Goals

  • Provide an API for the continuous consumption of JFR data on disk, both for in-process and out-of-process applications.
  • Record the same set of events as in the non-streaming case, with overhead less than 1% if possible.
  • Event streaming must be able to co-exist with non-streaming recordings, both disk and memory based.

Non-Goals

  • Provide synchronous callbacks for consumers.
  • Allow consumption of in-memory recordings.

Motivation

The HotSpot VM emits more than 500 data points using JFR, most of them not available through other means besides parsing log files.

To consume the data today, a user must start a recording, stop it, dump the contents to disk and then parse the recording file. This works well for application profiling, where typically at least a minute of data is being recorded at a time, but not for monitoring purposes. An example of monitoring usage is a dashboard which displays dynamic updates to the data.

There is overhead associated with creating a recording, such as:

  • Emitting events that must occur when a new recording is created,
  • Writing event metadata, such as the field layout,
  • Writing checkpoint data, such as stack traces, and
  • Copying data from the disk repository to a separate recording file.

If there were a way to read data being recorded from the disk repository without creating a new recording file, much of this overhead could be avoided.

Description

The package jdk.jfr.consumer, in module jdk.jfr, is extended with functionality to subscribe to events asynchronously. Users can read recording data directly, or stream, from the disk repository without dumping a recording file. The way to interact with a stream is to register a handler, for example a lambda function, to be invoked in response to the arrival of an event.

The following example prints the overall CPU usage and locks contended for more than 10 ms.

try (var rs = new RecordingStream()) {
rs.enable("jdk.CPULoad").withPeriod(Duration.ofSeconds(1));
rs.enable("jdk.JavaMonitorEnter").withThreshold(Duration.ofMillis(10));
rs.onEvent("jdk.CPULoad", event -> {
System.out.println(event.getFloat("machineTotal"));
});
rs.onEvent("jdk.JavaMonitorEnter", event -> {
System.out.println(event.getClass("monitorClass"));
});
rs.start();
}

The RecordingStream class implements the interface jdk.jfr.consumer.EventStream that provides a uniform way to filter and consume events regardless if the source is a live stream or a file on disk.

public interface EventStream extends AutoCloseable {
public static EventStream openRepository();
public static EventStream openRepository(Path directory);
public static EventStream openFile(Path file);

void setStartTime(Instant startTime);
void setEndTime(Instant endTime);
void setOrdered(boolean ordered);
void setReuse(boolean reuse);

void onEvent(Consumer<RecordedEvent> handler);
void onEvent(String eventName, Consumer<RecordedEvent handler);
void onFlush(Runnable handler);
void onClose(Runnable handler);
void onError(Runnable handler);
void remove(Object handler);

void start();
void startAsync();

void awaitTermination();
void awaitTermination(Duration duration);
void close();
}

There are three factory methods to create a stream. EventStream::openRepository(Path) constructs a stream from a disk repository. This is a way to monitor other processes by working directly against the file system. The location of the disk repository is stored in the system property "jdk.jfr.repository" that can be read using the attach API. It is also possible to perform in-process monitoring using the EventStream::openRepository() method. Unlike RecordingStream, it does not start a recording. Instead, the stream receives events only when recordings are started by external means, for example using JCMD or JMX. The method EventStream::openFile(Path) creates a stream from a recording file. It complements the RecordingFile class that already exists today.

The interface can also be used to set the amount of data to buffer and if events should be ordered chronologically. To minimize allocation pressure, there is also an option to control if a new event object should be allocated for each event, or if a previous object can be reused. A stream can be started in the current thread or asynchronously.

Events stored in thread-local buffers are flushed periodically to the disk repository by the Java Virtual Machine (JVM) once every second. A separate thread parses the most recent file, up to the point in which data has been written, and pushes the events to subscribers. To keep overhead low, only actively subscribed events are read from the file. To receive a notification when a flush is complete, a handler can be registered using the EventStream::onFlush(Runnable) method. This is an opportunity to aggregate or push data to external systems while the JVM is preparing the next set of events.

Alternatives

JMX notifications provide a means for the JDK and third-party applications to expose information for continuous monitoring. There are, however, drawbacks that make JMX unsuited for the purpose of this JEP.

  • Data points collected in the JVM often happen at places where a call to Java code is not possible, for instance during a GC induced safepoint.
  • Developer time has already been invested in collecting data using JFR. Rewriting all those probe points for JMX would be a very large effort.
  • JMX doesn't provide a mechanism to filter out events before they are sent, which means that the system could easily be flooded.
  • Complex data structures with references, such as stack traces, can't be efficiently represented using Open MBean types.

Testing

  • Verify that the feature doesn't have any memory leaks.
  • Verify that the feature has stable performance over time (appropriate stress testing).
  • Write unit tests for all exported methods.
  • Validate that event subscriptions work with other recordings running simultaneously.
  • Verify that the API works well out of the box.
  • Verify that the API is suitable for forwarding event data for consumption by other frameworks.
  • Verify that the API is suitable for environments where low latency is important (minimal GC pauses).
  • Verify that the API is suitable for tools vendors, i.e. data arriving at a rate suitable for charting.
  • Verify that the API is secure, it should not be possible to get a callback in a privileged thread context.
  • Validate that the overhead is acceptable.
  • Verify that it's not possible to create infinite recursion in subscribers.

Risks and Assumptions

  • Operations in API callbacks may provoke JFR events, which could lead to infinite recursion. This can be mitigated by not recording events in such a situation.