Data Latency: What You Need to Know Right Now (Or Pretty Soon, Anyway)

Published on March 11, 2019/Last edited on March 11, 2019/11 min read

Data Latency: What You Need to Know Right Now (Or Pretty Soon, Anyway)
AUTHOR
Anna Mongillo
Business Intelligence Analyst at Braze

You may not realize it, but you’re living in the past.

In this world, information doesn’t travel instantaneously. Think about the time it takes for light to bounce off an object to reach your eye, for your nerves to transmit sensations from your skin to your brain, or for music to travel up through the wires of your headphones to the speakers in your ears. These things take time. And that’s okay with us; the delay is minimal, even imperceptible, because we’ve adjusted to it. It’s part of our daily lives.

Another thing thing that takes time? Data flowing through the internet. It has a destination, and it usually won’t get there right away. (That’s the trouble with tubes…) At Braze, we know all about this phenomenon. It’s called data latency.

What is Data Latency?

Data latency, for our purposes, refers to the time delay between when data is generated and when that data is available to use. Much like the delays you unconsciously experience in your day to day life, data latency may be something you’ve never noticed before. Nevertheless, it’s probably there, and could be impacting your customer engagement efforts.

Why Does Data Latency Happen (and What Does It Affect)?

Generally, latency is thought of as the time it takes for data to travel from server to server. But in the world of digital marketing, mobile devices come into play, bringing with them a host of additional considerations—think user location, poor network connection, low bandwidth, and software and hardware that don’t support quick data transmission. Therefore, we like to think of data latency in terms of two types of delay: 1) a delay between when a user generates data and when it is sent from that user’s device to your systems, and 2) a delay in the time it takes for that data to become available for business use, even after your systems have ingested or received it.

These types of delays affect one key component of executing on the types of brilliant experiences that Braze strives to help our customers deliver: timeliness. Focusing on timeliness means ensuring our customers receive their data quickly enough to use it as part of a customer interaction and deliver on those brilliant experiences.We know how important it is to marry your data with your marketing strategy, and utilize that data as quickly and accurately as possible. So, what does that look like with data latency as part of the equation? We decided to dig into data from our Braze Currents high-volume data export tool to determine how long it takes for our customers to be able to use data that’s generated, what that means, and what they can do about it.

Exploring Data Latency with Braze Currents

If you’re unfamiliar with our Currents product, you should know that it’s a data streaming export tool that allows brands to receive a continuous stream of granular event data—like who opened a brand’s app, visited its website, or clicked one of its emails—and trace those events down to the second. We looked at 34 billion of those Currents events for our latency analysis and found that:

  • 73% of all Currents events were processed within one minute
  • 93% were processed within five minutes
  • 99% were processed within the day.

The remaining 1% of data saw latency ranging anywhere from one day to 116 years (that’s definitely because a user manually changed the clock on their device, altering the recorded timestamp of the event—or possibly traveled back to 1903 via Delorean).

Where else could the latency we found be coming from? Over the course of our analysis, we formed a primary hypothesis: that the amount and extent of the latency would differ based on event source.

To investigate, we divided the Currents data into two categories: 1) device-generated events that are triggered by end user activity, like a user opening an app, tapping on a push notification, or performing some custom event, and 2) service-generated events sent to us by third-party sources like Apple, Google, or an email partner when, for instance, a push bounces, email activity occurs, or an app uninstall is detected.

Why Should I Care About Data Latency?

Understanding data latency—if you’re experiencing it and why, and knowing how you can use that knowledge to your advantage—can do a lot to help you stay competitive in a world that’s becoming more data-driven every day. In this environment, brands need to be able to successfully interpret the data at their disposal to inform their digital marketing decisions and build an awareness of their users, and knowing about data latency could be the difference between guessing about them and having a true understanding.

Your users and their behaviors and preferences are constantly changing, and composing an accurate picture of exactly when, how and why those changes are happening will help you connect with them in a more human way. Collect and act on your users’ data thoughtfully and effectively, and you’ll gain their trust, build credibility, and have a greater positive impact on their brand experience. But act on misleading or inaccurate data and you run the risk of weakening your competitive standing and missing out on chances to reach users in a meaningful way, like sending the perfect re-engagement campaign or personalized discount deal at the right time.

We found that service-generated events generally had low latency, with 99% of events processed within a fifteen-minute window. On the other hand, only 94% of device-generated events were available within fifteen minutes. Device-generated events accounted for 84% of all latency and 97% of all events late by over fifteen minutes. That makes these events responsible for most of the latency we found.

Why? Think for a moment about all the situations you’ve been in when you opened an app or engaged with a message on your device. Maybe you opened an app when you had no WiFi or poor service (on the subway, say), and didn’t stay in the app long enough for your data to be sent to the server until the next time you opened it. Or maybe you live in a region where battery and data usage concerns mean that your device settings don’t allow for regular data transmission from your phone or computer. These types of issues uniquely affect device-generated events over service-generated events because device-generated events are intrinsically tied to a user’s situation: the type of device they’re using, their network connection, their location, their settings, and more.

While service-generated events may still have some latency (remember, information doesn’t travel instantaneously), they aren’t dependent on a device or a user to the extent that device-generated events are. Service-generated events like push bounces or uninstalls are logged primarily based on notifications from Google or Apple servers that some event has happened. As for email events, Braze has partners that provide a server-to-server connection, delivering high volume at enterprise scale and minimizing the friction associated with information transfer. So, service-generated events aren’t nearly as susceptible to the types of issues we’ve specified. It’s expected that they would have low latency.

But for devices, we're at the mercy of the tubes.

What Other Potential Causes Should I Watch For?

While most of the data latency we saw in Currents was caused by delays between when a user device generates data and when that data is sent from the user’s device and received by us, there’s also the possibility that your company's data infrastructure or ETL (Extract, Transform, Load) processes are introducing some latency, causing an extra delay even after your systems have received the data. For example, we often see teams implement daily batch processes where data isn't even made available to other systems for up to an entire day. And even after that’s over, data may need to flow from one system to another, introducing more latency.

Okay, But Why Should I Care About Data Latency?

It’s simple. Recognizing and quantifying the latency in your data—and acting on it appropriately—will help you make quicker and more accurate marketing decisions, build more compelling campaigns, and connect with your users in a more human way.

In a world that’s becoming more data-driven every day, brands need to be able to successfully interpret the data at their disposal to inform their digital marketing decisions and build an awareness of their users. Knowing about data latency could make the difference between guessing about them and having a true understanding.

What Can I Do About Data Latency Now?

This newfound data knowledge shouldn’t bog you down; it should serve as a tool.

First, it’s practical to assume some amount of latency in your data. Ask your data team to measure that latency and document what’s expected. You can work together from there to determine what is acceptable for your business and its specific needs and use cases. If you have more data latency than you can live with, search for the root causes and decide what to do about it.

Once you have an idea of how much latency to expect, think about the right course of action for the particular set of data you’ve collected. Let’s take our Currents analysis as an example. If marketers at Braze were making decisions based off that data, we could use our findings to put exception logic in place that accounts for the type and extent of the latency we’re seeing. We could also use it to determine the right time to assess the information at our disposal: that is, make a decision within seconds based off 70-80% of our users’ data, or wait fifteen minutes to have the full range of data before acting. Amazon founder and CEO Jeff Bezos once argued that all decisions should be made with about 70% of the information you want; waiting for 90% could be too late. Do you need the full data set, or should you focus on moving forward, fast?

Of course, your answer is going to depend on the situation and what your brand is looking to accomplish. But be mindful of the fact that sometimes speed is more important than having all the possible data at your disposal—so don’t let analysis paralysis stunt your ability to execute on your marketing strategy.

Also, keep in mind that the systems you use can do a lot to address or exacerbate the impact of data latency. For instance, Braze is built to minimize negative impacts from delays. Controls in the Braze platform will prevent your campaigns from triggering off delayed data, automatically compensating for the latency to ensure the best messaging experience. Meanwhile, Currents will continue to send the raw event data unmodified, so you can use and explore it as you see fit. So when you’re choosing technologies to work with, make sure they’ll help address data latency, not exaggerate it.

Finally, you might also want to talk to your data team about their next steps. That’s the right move. From our data team to yours, here are some things they should be looking at:

  • Quality assurance. Your data team should reassess any data quality checks and tests with data latency in mind.
  • Indexing practices. It may be useful for your data team to reindex your brand’s data on a different timeline.
  • Machine learning and AI models. Your data team might want to adjust their algorithms to reflect the latency in your brand’s data. It could be valuable to incorporate lagged variables into their data model, look into distributed lag models, or model that latency as an autocorrelation.
  • Data retention and governance policies. If your retention policy is 30 days and you’ve found that one in a million data points arrives 30 days late, your data team is going to need to assess why that is and whether you’re violating legislation like the EU’s General Data Protection Regulation (GDPR). Is your data team’s policy based on when your brand ingests the data, or when the event actually occurs?

Data Knowledge is Power

As customer engagement becomes more and more data-driven, it’s increasingly important that all areas of your organization are up to speed on your data and how it’s being handled. It’s equally critical that brands focus on timeliness as a key element of their marketing strategy. So in the end, developing internal knowledge about data latency will only alter your customers' brand experience for the better. The more you can understand and react to the latency in your data, the faster you'll be able to make accurate decisions, and the better you'll be able to understand and engage your users.

So get out there and deliver on those brilliant experiences.

Want to learn about how to make your data more actionable, quickly and easily? Check out our look at how Braze has integrated Currents with Looker using Looker Blocks.

Methodology

This analysis drew on Currents data spanning from December 23, 2018 to January 21, 2019, and included 34.4 billion event IDs. Event counts (distinct counts of event IDs) were aggregated and grouped by event type and the time between event time and server processing time.

Releated Content

View the Blog

Join the movement to journey orchestration.

The move to highly-intelligent, always-on journey orchestration is happening. And much of it is happening on our platform. Join brands of all sizes who are taking the craft of customer engagement to the next level.