How to track Google Discover in real-time
by Valentin Pletzer - August 1st, 2019 (last updated June 6th, 2020)
The goal of this article is to explain how to setup a realtime approximation of Google Discover traffic in web analytics.
I will focus on Google Analytics (GA) but it’s pretty much the same for a lot of the other web analytics services. Some “intelligent” aggregation will skew your perspective and therefor mislead you. Google Analytics for example will aggregate pretty much all referred clicks from www.google.com (and other TLDs) as “google / organic”. In the end you won't know if your traffic came via Google Search or Discover.
After applying the filter described later on you will see something similar to the following screenshot. Users from Google Discover will show up as
com.google.android.googlequicksearchbox/ while traffic from
www.google.com/ is Google Search.
Before you proceed some important information:
Changing your Google Analytics settings might result in data loss and/or interrupt an otherwise continuous row of data. Proceed at your own risk! To mitigate risk I would recommend to clone your current in-production analytics view and apply the filter only on the copy.
How to prepare Google Analytics for Discover tracking
As described above Google Analytics by default does aggregate all www.google.xyz-referrers into a single line “google” which results in dropping a lot of information which proves precious for differentiating Google Search and the Google Discover app. That is why my recommendation is to setup a filter which overwrites the “campaign source” field.
Figure 1 shows the filter I applied in our Google Analytics view. The important part is the regex applied to the unmodified referrer (named “referral” in GA).
The regex basically groups a referrer into several groups like protocol (e.g. “https”), hostname (e.g. “www.google.com”) and the most important part: It extracts the first slash “/” if available. The filter itself then writes hostname plus slash into the “campaign source” field. You might want to write this data into a custom variable but overwriting the source does enable you to monitor the data in the real-time section of Google Analytics.
Why the slash is so important
When analysing the full referrer-data I noticed something very curious: Some looked like this
https://www.google.com/ (which you might expect) while others were missing a trailing slash
https://www.google.com. While RFC 3986 states that an empty path (aka no slash) is OK, this is highly unusual for browsers which all do send at least a slash as URL path. So when you get an click from the Google Search webpage you can expect to get
https://www.google.com/ as referrer.
The only likely candidates for
https://www.google.com without slash are apps. And after some research I was able to confirm that indeed the official Google Android app is sending
https://www.google.com as a referrer – but only if a user clicks on a recommendation card (aka Google Discover) since a search will always open a browser/webview with a typical result page and this will send a “normal” referrer.
The Google iOS app sadly doesn’t send a referrer at all. So this will show as “(direct) / (none)” in Google Analytics. You can improve detection a tiny bit by look at the AMP viewer (see the “Even more tracking extravaganza” part of this article).
A complete list of Google referrer mess
After some promising discoveries I went to build a complete list of Google Services and their referrers. As it turns out Google referrer are a complete and utter mess. This is what it looks like at the beginning of June 2019.
|Google App (Discover)||no referrer
|Google App (Search)||https://www.google.com/search?q=.... (not cut)
|Google News App||https://news.google.com/
|Google Chrome (Articles for you)||https://www.googleapis.com/auth/chrome-content-suggestions
|Google Chrome (Search)||https://www.google.com/
|Google App (Discover)||https://www.google.com (no trailing slash)
|Android Floating Search Bar||https://www.google.com (no trailing slash)
|Android Google Chrome (Search)||https://www.google.com/
|Android Google Chrome (Articles for you)||https://www.googleapis.com/auth/chrome-content-suggestions
|Android Google News Widget||https://news.google.com/ |
|Android 9||full referrer
|Google App (Discover)||android-app://com.google.android.googlequicksearchbox/https/www.google.com
|Google App (Search)||android-app://com.google.android.googlequicksearchbox (no trailing slash)
|Floating Search Bar||android-app://com.google.android.googlequicksearchbox (no trailing slash)
If you look into the history of Google referrers you might find a lot more mess. For example: Google News used to be news.url.google.com but there was a bug for several weeks and it changed into plus.url.google.com and then back again. At the same time you might have seen news.google.com as a referrer as well. Mobile Chrome Recommendations did change into discover.google.com in November 2018 and then back into www.googleapis.com in December. Also: It is quite difficult to get a clear answer from anyone at Google but it looks like Chrome suggested articles are a seperate beast and not part of Google Discover.
Even more tracking extravaganza
Including tracking in Accelerated Mobile Pages can be achieved by using the amp-analytics component. Your implementation might look different (e.g. setting up the tracking within an iframe) but either way you’ll probably already add some custom variables to you tracking pixel and then this should be easy. If not, have a look at the documentation specifically the section on extra url parameters.
Not very well known but very useful are AMP HTML URL Variable Substitutions in our case it is worth to have a look at the “viewer” variable. The documentation states “[viewer] provides an identifier for the viewer that contains the AMP document. An empty string is provided when the document is loaded directly in the browser or if the id is not found.” Meaning if your document is loaded not from your server but via a cache (e.g.
https://foo-com.cdn.ampproject.org/c/s/foo.com/amp_document.html) a variable called “viewer” is most probably available telling you which cache specifically is used.
If you don’t know how AMP caches work you might want to have a look at this great explanation: How AMP pages are cached
Figure 2 does show our filter setup after the viewer variable is available as a value in custom dimension (in this case custom dimension 157 is named ampViewer_h157). The filter does push the data from the custom dimension into “campaign term” and appending the name of the operating system of the client. The reasoning behind this is to a) have this data available in “real-time” as well as b) to differentiate between Android and iOS users.
Depending on your setup you might not use “campaign term” anyways since Google doesn’t send the full referrer for a very long time now and the “q”-parameter, which includes the search query/keyword, isn’t available to parse. At least that is the official story. There is an exception to this: On iOS Google has to use the webview as a browser within it’s Google App. People who visit your page from within this webview will send the full referrer since Apple doesn’t seem to honor the meta tag which should cause a cutoff of all the GET-parameters in the url.
So what's the deal with AMP viewer? If you look at the table above you will notice that traffic from the iOS Google Discover app doesn't send a referrer, but it does set an amp viewer which is
https://www.google.com/. So if you use AMP and Google Discover does indeed send you iOS traffic this is how you will know.