• /
  • Log in
  • Free account

Baseline your ingest data

Baseline

In this stage it is necessary to get a high level view of all of the telemetry currently being generated by your organization. The unit focuses on breaking down ingest stats into various groups such as account, telemetry type, and application. These figures will be used to inform the Optimize your ingest data and Forecast your ingest data stages.

You'll learn how to generate a structured breakdown report for the following dimensions:

  • Organization
  • Sub account
  • Billable Telemetry Type

In addition you'll learn how to create highly granular breakdowns including:

  • Application (APM|Browser|Mobile)
  • K8s Cluster
  • Infrastructure Integration

Desired outcome

Understand exactly which groups within the org are contributing which types of data and how much.

Prerequisites

Process

Install the data governance baseline dashboard
Add ingest target indicators to your dashboard
Generate a tabular 30 day ingest report
Customize your report
Detect ingest anomalies
Install the entity breakdown dashboard (Optional)
Install the cloud integration dashboard (optional)

Install the data governance baseline dashboard

  1. Navigate to the data governance quickstart.
  2. Click Install this quickstart in the upper right portion of your browser window.
  3. Select your top level master account or POA account in the account drop down.
  4. Click Done since there is no agent to install.
  5. When the quickstart is done installing, open the Data Governance Baseline dashboard.

That will bring you to the newly installed dashboard.

Dashboard Overview

The main overview tab shows a variety of charts including some powerful time series views.

Org Wide View

The second tab provides a baseline report by sub-account and usage metric.

Org Wide Tabular

The remaining tabs provide detailed views of specific telemetry types such as browser data, apm data, logs, and traces. For example this screenshot shows the browser detail page

Browser Detail Page

Detail tabs include:

  • APM - ApmEventsBytes
  • Tracing - TracingBytes
  • Browser - BrowserEventsBytes
  • Mobile - MobileEventsBytes
  • Infra (Host) - InfraHostBytes
  • Infra (Process) - InfraProcessBytes
  • Infra (Integration) - InfraIntegrationBytes
  • Custom Events - CustomEventsBytes
  • Serverless - ServerlessBytes
  • Pixie - PixieBytes

Caution

If you are using a POA account, be aware that the POA account itself will not be included in facets by consumingAccountName. Thats because the POA account receives no actual telemetry. If you install the dashboard into a normal parent account the consumption value of that parent account is the sum of data ingested directly to the parent account and data sent to any sub-accounts.

For example, when faceting on monthly ingest you may see something like:

  • Parent Account Ingest: 100GB
  • Sub Account 1 Ingest: 50GB
  • Sub Account 2 Ingest: 20GB

Where Parent Account Ingest includes data for sub accounts 1 and 2 and an additional 30GB that is sent directly to the parent account.

Add ingest target indicators to your dashboard

In the prerequisites section we discussed the concept of a monthly usage target. You may actually have several targets to help keep you on track:

  • An overall organizational target on daily rate or monthly ingest.
  • Targets per data type to ensure the optimal breakdown (for example 1 TB per day for logs and 2 TB per day for metrics).
  • Targets for specific sub-accounts or business units.

In our example we have an organization that targets their total organizational ingest to < 360 TB per month. This was a new target after having reduced ingest down from over 20TB per day (600 TB per month).

To make the target easier to measure against we added a threshold line chart by adding the static number 360000 to our SELECT statement.

SELECT 360000, rate(sum(GigabytesIngested), 30 day) AS '30 Day Rate' FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago limit max compare with 1 month ago TIMESERIES 7 days

Ingest target 30 day

We can also apply a daily rate target line. Let's just divide 360000 by 30 and we'll use 12000 as our daily rate target. Update the Daily Ingest Rate (Compare With 3 Months Prior) chart:

SELECT 12000, rate(sum(GigabytesIngested), 1 day) AS avgGbIngestTimeseries FROM NrConsumption WHERE productLine='DataPlatform' TIMESERIES AUTO since 9 months ago limit max COMPARE WITH 3 months ago

Ingest Target Daily

Generate a tabular 30 day ingest report

  1. Open the previously installed Data governance baseline dashboard.
  2. Click on the Baseline report tab.
  3. Click on ... in the upper right of the "Last 30 Days" table and choose Export as CSV
  4. Import the CSV into Google Sheets or the spreadsheet of your choice.

Alternatively if you did not install the dashboard you may simply use this query to create a custom chart in Query Builder:

SELECT sum(GigabytesIngested) AS 'gb_ingest_30_day_sum', rate(sum(GigabytesIngested), 1 day) AS 'gb_ingest_daily_rate', derivative(GigabytesIngested, 90 day) as 'gb_ingest_90_day_derivative' FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago facet consumingAccountName, usageMetric limit max

Below is an example of a sheet we imported into Google Sheets.

DG Baseline Sheet

The screenshot shows the table sorted by 30 day ingest total.

Feel free to adjust your timeline and some of the details as needed. For example, we chose to extract a 90 day derivative to have some sense of change over the past few months. You could easily alter the time period of the derivative to suit your objectives.

Customize your report

Add useful columns to your report in order to facilitate other phases of data governance such as optimize and forecast. The following fields will help guide optimization and planning decisions:

  • Notes: Note any growth anomalies and any relevant explanations for them. Indicate any major expected growth if foreseen.
  • Technical Contact: Name of the manager of a given sub-account or someone related to a specific telemetry type.

Detect ingest anomalies

Alert on ingest anomalies

Use this ingest alerts guide to make sure that an increase in data consumption doesn't catch you by surprise. At a minimum, create:

  • A threshold alert to notify if you exceed monthly targets for data ingest beyond seasonal increases
  • A baseline alert to notify you of a sudden sharp increase ingest data

In addition to using alerts to identify consumption anomalies, you can use lookout to explore potential ingest anomalies.

Lookout View

Lookout allows you to provide nearly any NRQL query and it will search for anomalies over a given period of time. This view is based on the query

SELECT rate(sum(GigabytesIngested), 1 day) AS avgGbIngest FROM NrConsumption WHERE productLine='DataPlatform' FACET usageMetric

Lookout View Usage Metric

Change the facet field to consumingAcountName to get this view:

Lookout View Consuming Account

Install the entity breakdown dashboard (Optional)

In a previous section you installed the ingest baseline dashboard that uses NrConsumption as its primary source. In addition to that high level view you can create other visualizations that use bytescountestimate() to estimate ingest for nearly any event or metric. A detailed overview of bytescountestimate() was discussed in the prerequisites section.

  1. Go to the same quickstart you used for the baseline dashboard.
  2. Click Install this quickstart in the upper right section of your browser window.
  3. Don't install this instance of the dashboard into a POA account. Instead, install it into any account that contains APM, Browser, Mobile applications or K8s clusters import dashboard function. You can install this dashboard into multiple accounts. You can install it into a top-level parent account and modify the dashboard so you have account-specific charts all in one dashboard.
  4. Click Done since there is no agent to install.
  5. When the quickstart is done installing open the Data Governance Entity Breakdowns dashboard.

Entity breakdown dashboard

You can refer back to this section to see exactly which event types are used in these breakdowns.

NOTE These queries consume more resources since they are not working from a pre-aggregated data source like NrConsumption. It may be necessary to adjust the time frames and take advantage of additional where clauses and the limit clause to make them performant in some of your environments.

Install the cloud integration dashboard (optional)

Cloud Integrations can often be a significant source of data ingest growth. Without good visualizations it can be very difficult to pinpoint where the growth is coming from. This is partly because these integrations are so easy to configure and they are not part of an organization's normal CI/CD pipeline. They may also not be part of a formal configuration management system. Fortunately this powerful set of dashboards can be installed directly from New Relic I/O. Individual dashboards installed by this package include:

  • AWS Integrations
  • Azure Integrations
  • GCP Integrations
  • On-Host Integrations
  • Kubernetes

Infrastructure Integrations IO

Conclusion

The process section of this page took you through the creation of data ingest visualizations and reports. You can now review data ingest with a data driven visual approach that you and your peers can use to collaborate around.

Going forward, decide which visualizations to use for:

Additional resources

Manage Incoming Data
Data Management Hub
Drop Data Using Nerdgraph
Alert on Data Ingest Anomalies
Automating Telemetry Workflows
Metrics Aggregation and Events to Metrics

Create issueEdit page
Copyright © 2022 New Relic Inc.