It's hard to trust data you don't believe in. Maybe you've recently invested in a new analytics implementation only to find some glaring discrepancies during data validation. Or maybe you’ve had GA for awhile now but know not to trust the eCommerce numbers because they’re always off. This problem exists across companies and across industries. The big question is: Why is my Google Analytics data different from my source of truth?
The truth is we never expected them to be the same. We expect them to be close. You’re never going to have a 100% match for every metric across every data point for several reasons. This will vary depending on what data you’re looking at, which tools are collecting it, how they’re implemented, and how much traffic you have (among other factors).
Allow me to offer a general rule of thumb: when the discrepancy crosses the 10% threshold, you need to investigate. If it’s less than 10%, it’s okay. If it’s 5%, that’s quite good. If it’s 3% or less, well done...that’s cause for celebration! You may decide to investigate at 6% or 8.5% - there isn’t a hard and fast rule here. If you think you can close the gap, by all means, do! This is especially important for your high value metrics.
Let’s dive into an example. You pull your eCommerce revenue numbers for last month through your backend source of truth (ie. data from your eCommerce solution/shopping cart software) and you pull your revenue numbers from Google Analytics. They don’t match. They’re off by about 10%. In your head you think, “whatever, I’ll use my backend data.” And you know what? For a large business decision, I would agree with you. Obviously you’re going to trust your source of truth over GA when it comes to sales and revenue data.
And yet, here’s something to consider: with this mindset of “I’ll just go to the source of truth,” you will end up with many data sources that you have to pull from - sometimes manually - each month. There is incredible value in ensuring your GA data is as accurate as possible and includes integrations with and links to other datasets. Then you’ll have one place to go for a quick view at all your data including user behavior on the website, A/B test results, email signups, accounts created, advertising data, campaign performance, shopping behavior, product and revenue data, etc.
But back to our question...
Why are the numbers different?
1. Your implementation is incorrect.
We’d like to assume it’s not, but you have to rule this out first. Certainly the systems are working differently, so the first step is to know what that difference is.
Let’s say you know that when an order is completed on your website, the call goes to the server, it’s considered successful, and the transaction is counted in your backend. When is this information being sent to Google Analytics? With measurement protocol on the server side? Great success has been reported with that method. Or maybe once a confirmation page loads, you’re sending a dataLayer push on the page or via GTM with Enhanced eCommerce information. Okay. Will some transactions drop off if the page load is too slow or the user leaves immediately? Maybe. Will you get duplicate transactions if the user refreshes the page or comes back after the session ends? Maybe. Knowing what these baseline discrepancies are is helpful. If your discrepancy is very large, there may be a more serious implementation problem - the transaction isn’t always being sent to GA, all of the data isn’t included, it’s firing at the wrong time, etc.
2. You’re not sending the same data.
Using Google Analytics Enhanced eCommerce, there are many fields that are “optional,” like tax and shipping. If you choose not to send that data, it won’t appear anywhere in GA, but it will appear in your backend system. Or you’re sending different numbers for product SKUs or transaction IDs because you’re generating (or pulling) them with another method, or from another source, than your backend. If you make sales in multiple currencies, are you converting them with the same method at the same time? There are several ways you could be sending slightly different data to GA than to your source of truth. Know what the differences are so you can account for them in your analysis.
3. You’re not comparing the same data.Some light processing on a data pull can make all the difference in seeing your numbers more closely aligned.
Tax and Shipping: Even if you’re sending tax and shipping, maybe your backend’s “Total Revenue” doesn’t include them. In GA, Total Revenue does include tax and shipping (Product Revenue does not). Be sure when you’re pulling reports that compare apples to apples. Sometimes the same dimension in different systems is defined differently, named differently, or being captured differently. If you aren’t sending tax and shipping data to GA, be sure to remove it from your backend revenue number before you compare the two.
Refunds and Cancellations: Your backend system is going to be much better at having information on refunds and cancellations. You can pull this data into GA using data import, but if you aren’t doing that, see if you can parse out the transaction IDs that have refunds or partial refunds associated with them (or at least remove the amount of revenue that was refunded from your backend pull before comparing to GA).
Test Transactions: Your GA data might include test transactions if you aren’t sending these to a test property in GA. Can you isolate these and remove them before comparing to your source of truth?
These are just some of the ways you could be comparing different data for eCommerce reporting. You’ll need to think about which reports you’re seeing discrepancies for and be certain you are comparing the same data. Can you think of any reasons for differences? What can you rule out? Sometimes it can help to pull a smaller amount of data, like one week or one day, to investigate. Are you using the same time zone? Can you get the transaction IDs to line up? Are there duplicates? Is GA missing transactions? Can you find any pattern to them? Are they mostly mobile transactions or international transactions or coming from the same browser version? You’ll have to approach your specific problem with the knowledge you have about your business, your website, your customers, and your implementation.
While the threshold for sampling has gotten higher in recent years, especially for pre-aggregated data tables, it’s always worthwhile to be sure your data isn’t sampled. That would be an obvious reason why there’s a discrepancy. Check for sampling in your report. If there is sampling, try pulling a smaller date range (a month instead of a quarter or a year).
What About Other Reasons?
The aforementioned four influencers are often the biggest contributors to data differences. However, there are other areas where users regularly see discrepancies. Here are a two, as additional examples:
GA Sessions vs. DCM Clicks: Be sure DoubleClick campaigns are implemented correctly and showing up in GA. Clicks and Sessions are fundamentally different. If a user clicks an ad and bounces before the page loads, you’ll never see the session. Avoid having slow load times to help with this. Because these are coming from different systems (the click happens on the ad, the session happens on your site) they will always be off. If they’re off by a tremendous amount or it swings wildly, there could be bot traffic or other things influencing the discrepancy.
Account Registrations in GA vs. your CRM: If a GA event fires on a button click rather than after a server side validation, your GA events will likely be inflated. Suggestion: implement the event so that it fires after the server side validation.
Where are you seeing inconsistencies in your data? If they’re greater than 10%, allocate resources to investigate.
Once you determine the cause of the data discrepancies, making adjustments to your implementation, analysis methods, and reporting can help close the gaps between your source of truth and your GA data. Your GA reports should provide you with directionally accurate data, and peace of mind. With a deeper understanding of how you’re sending data to each source and confirming that you’re pulling an apples-to-apples comparison (and knowing when you’re not) it will be a lot easier to trust and analyze your data.