In order to begin forecasting our future data ingest we must develop an understand of the kinds of growth drivers that will potentially impact some or all of of our data sources. The following sections describe what we call general growth drivers. Finally we introduce the concept of a growth driver worksheet that can be used in your ingest Budgeting process.
Understanding Growth Drivers
Seasonal and Business Cycle Growth
It's critical to understand the sources of telemetry growth that will occur through the year and over the years. Some of these are generally anticipate and others may be unexpected and others completely anomalous. These concepts are important when coming up with baseline budgets and growth targets and can also help during an ad hoc resolution of unexpected telemetry growth. This class of user growth is often welcome but can also seem overwhelming if we have not data governance plan in place. Our business is growing and we are bringing in new users and the activity of each of those users is causing additional Browser, APM, and Log data to be emitted. The need to scale K8s clusters, load balancers, and supporting platforms like Kafka also cause an increase in the emitted telemetry. Another type of growth is caused by an increase in business transactions without an obvious increase in the number of users. For example a website that sells one type of product (Shoes) has now broadened its inventory to offer Hats and Gloves. This results in more business transactions per user causing a similar cascade in telemetry as an increase in users.
Code Refactor
There are some scenarios where a change in application code will cause a sudden increase in telemetry volume without any additional users or business transactions. Some examples:
- A developer adds additional java javascript code that interacts with the backend every time a user visits a page.
- A developer adds new logging code to some application methods that are called very frequently.
- A new database schema requires multiple database calls where previously one was needed.
- A monolithic application is broken into 5 microservices with the resulting APM and distriubuted trace data being emitted for each.
Instrumentation Misconfiguration
Field Example: An organization previously used a 30s sampling rate for core operating system metrics for about 2000 hosts. A misconfiguration or unauthorized change to 10s tripled the OS metrics telemetry captured from those hosts.
Later on we will discuss telemetry standards. One of the thing often governed by a telemetry standard is the sampling rate for various monitoring activities.
Increasing Breadth of Observability
Field Example: It is part of the continuous improvement process to expand observability. An organization that was monitoring Kafka broker health for nearly two years decided to start monitoring Kafa topic offsets. Not realizing the verbosity of topic monitoring data they are surprised when the telemetry footprint from Kafka monitoring increase 5 times.
Caution
Mergers and acquisitions are a common way in which telemetry growth "sneaks up" on an organization. We suggest you incorporate observability consolidation as a formal action item as part of the integration process. This is no different then adding cloud compute consolidation to your overall integration plans. This is also an area where you should fall back on your New Relic account team since there are times when these events are somewhat unexpected.
Unexpected Third Party Change
Field Example: A JMX integration is designed to get any metric exposed by a third party application with prefix "Transaction". With version 1.0 of the application that yields in 10 events per sample. The team maintaining the third party application adds 10 additional metrics with the "Transaction" prefix. When our team installs the new code, we are a bit surprised to know what JMX events have increased 2 times.
Growth in User Counts or in Overall User Activity
[TBD]
Growth Driver Worksheet
Understanding the growth drivers of your telemetry is as important as understanding the value drivers. With growth drivers there are ways in which they are highly correlated, but in generally we should choose the main and most direct driver. The following growth drivers can be used to augment the telemetry backlog with information that can be used to understand the quarterly and yearly growth characteristics.
- Increase in Active Users
- Increase in Number of Products
- Infrastructure Scaling Initiative
- Innovation Initiatives
- Efficency Efforts (Reduction of Telemetry Ingest)
- Major code refactor
If your business operation is growing it makes sense to factor in growth. It wouldn't make sense to deploy a system and expect cloud compute costs to be flat despite growing users 20% or adding 2 times more products in a year. This attention to growth drivers is no different.
As part of the data governance practice we recommend you generate a sheet like this for each sub-account. These driver sheets will be used in the planning framework to generate forecasts of telemetry growth.
Team | Growth Driver |
---|---|
Streaming Video | This team is refactoring some K8s infrastructure. Currently the have on-prem K8s clusters managed in their data center. They expect to be spinning up some new clusters in their AWS VPC and for much of the quarter they may have redundant infra. The K8s Telemetry SME helped them arrive a growth rate of just under 5% for the quarter. In the quarter after this they may have a flat or negative reate as they bring down the on-prem clusters |
Cloud Platform Team | This team has plans to reduce log volume substantiall by getting rid of some excessively chatty, low value logs from some of their cloud services. Using a deep dive analysis using bytecountestimate() they came up with a plan to reduce ingest by 5% over the quarter. So they should see negative growth rate over 90 days |
International Services | This teams plan to add support for two additional countries. Working with the APM K8s and Mobile SMEs they were able to come up with an estimate of 7.5% groth, mustly coming from increased Mobile events. Since they have good forecasts of how much user activity they should see they were able to built a relatively good model based on current ingest with the 5 countries they currently support. |
Shipping & Receiving | This team plans to add application logs this quarter. Using estimates derived from the number of logs current recorded to disk and using some factors to account for the additional logs-in-context tags that will be added. This team expects a 12.6% growth this quarter. The Logging SME has given them excellent guidance on using New Relic drop rules as well as how to streamline the data in Fluentbit so they are confident that they will be able to steer into this estimate |
Marketing Technology | This team is refactoring a Java monolith into 3 or 4 separate microservices. Based on some code analysis from other refactors and a careful audit of the Telemetry behavior of the monolith thsi team has forecast a 26.7% growth rate. This is relatively large. However this is the kind of refactor that should leave the code base relatively stable for another 3 to 5 years. |
As part of this analysis we also recommend you generate an overall account growth factor based on your analysis of the baseline reports and the growth driver sheet. This growth drivefr is a single number that will be used in the final telemetry budget sheet.
Additional Technical Resources
Alert on Data Ingest Anomalies