Outlier detection (NRQL alert)

Alerts offers NRQL conditions in three threshold types: static, baseline, and outlier. This document explains how the outlier threshold type works, gives some example use cases and NRQL queries, and explains how to create an outlier condition.

Important

NRQL alerts do not affect alerts policies for a Synthetic monitor. For example, muting a NRQL alert will not mute a Synthetic monitor's alerts.

What is outlier detection?

In software development and operations, it is common to have a group consisting of members you expect to behave approximately the same. For example: for servers using a load balancer, the traffic to the servers may go up or down, but the traffic for all the servers should remain in a fairly tight grouping. See outlier detection in action in this NerdBytes video (2:51 minutes).

The NRQL alert outlier detection feature parses the data returned by your faceted NRQL query and:

Looks for the number of expected groups that you specify
Looks for outliers (values deviating from a group) based on the sensitivity and time range you set

Additionally, for queries that have more than one group, you can choose to be notified when groups start behaving the same.

This visual aid will help you understand the types of situations that will trigger a violation and those that won't.

For more on the rules and logic behind this calculation, see Outlier detection rules.

Tip

Note: this feature does not take into account the past behavior of the monitored values; it looks for outliers only in the currently reported data. For an alert type that takes into account past behavior, see Baseline alerting.

Example use cases

These use cases will help you understand when to use the outlier threshold type. Note that the outlier feature requires a NRQL query with a FACET clause.

A load balancer divides web traffic approximately evenly across five different servers. You can set a notification to be sent if any server starts getting significantly more or less traffic than the other servers.

Example query:

SELECT average(cpuPercent) FROM SystemSample WHERE apmApplicationNames = 'MY-APP-NAME' FACET hostname

Application instances behind a load balancer should have similar throughput, error rates, and response times. If an instance is in a bad state, or a load balancer is misconfigured, this will not be the case. Detecting one or two bad app instances using aggregate metrics may be difficult if there is not a significant rise in the overall error rate of the application.

You can set a notification for when an app instance’s throughput, error rate, or response time deviates too far from the rest of the group.

Example query:

SELECT average(duration) FROM Transaction WHERE appName = 'MY-APP-NAME' FACET host

An application is deployed in two different environments, with ten application instances in each. One environment is experimental and gets more errors than the other. But the instances that are in the same environment should get approximately the same number of errors.

You can set a notification for when an instance starts getting more errors than the other instances in the same environment. Also, you can set a notification for when the two environments start to have the same number of errors as each other.

The number of logged in users for a company is about the same for each of four applications, but varies significantly by each of the three time zones the company operates in.

You can set a notification for when any application starts getting more or less traffic from a certain time zone than the other applications. Sometimes the traffic from the different time zones are the same, so you would set up the alert condition to not be notified if the time zone groups overlap.

For more details on how this feature works, see Outlier rules and logic.

Create an outlier alert condition

EOL NOTICE

As of March 31, 2022, we're discontinuing support for several capabilities, including NRQL outlier alert conditions. For more details, including how you can easily prepare for this transition, see our Explorers Hub post and our transition guide for alert capabilities.

To create a NRQL alert that uses outlier detection:

When creating a condition, under Select a product, select NRQL.
For Threshold type, select Outlier.
Create a NRQL query with a FACET clause that returns the values you want to alert on.
Depending on how the returned values group together, set the Number of expected groups.
Adjust the deviation from the center of the group(s) and the duration that will trigger a violation.
Optional: Add a warning threshold and set its deviation.
Set any remaining available options and save.

Rules and logic

Here are the rules and logic behind how outlier detection works:

After the condition is created, the query is run once every harvest cycle and the condition is applied. Unlike baseline alerts, outlier detection uses no historical data in its calculation; it's calculated using the currently collected data.

Alerts will attempt to divide the data returned from the query into the number of groups selected during condition creation.

For each group, the approximate average value is calculated. The allowable deviation you have chosen when creating the condition is centered around that average value. If a member of the group is outside the allowed deviation, it produces a violation.

If Trigger when groups overlap has been selected, alerts detects a convergence of groups. If the condition is looking for two or more groups, and the returned values cannot be separated into that number of distinct groups, then that will produce a violation. This type of “overlap” event is represented on a chart by group bands touching.

Because this feature does not take past behavior into account, data is never considered to "belong" to a certain group. For example, a value that switches places with another value wouldn't trigger a violation. Additionally, an entire group that moves together also wouldn't trigger a violation.

Important

What is outlier detection?

Tip

Example use cases

Notify if load-balanced servers have uneven workload

Notify if load-balanced application has misbehaving instances

Notify of changes in different environments

Create an outlier alert condition

EOL NOTICE

Rules and logic

Details about alert condition logic

NRQL query rules and limits

Zero values for unreturned data

Outlier detection (NRQL alert)

Important

What is outlier detection?

Tip

Example use cases

Notify if load-balanced application has misbehaving instances

Notify of changes in different environments

Notify for time zone-related changes

Create an outlier alert condition

EOL NOTICE

Rules and logic

Details about alert condition logic

NRQL query rules and limits

Zero values for unreturned data