The fact that Google hides so much click data in Search Console has aroused plenty of suspicion. But could it actually be doing an OK job in terms of presenting data that’s useful? We took a look at over 560,000 URLs across our client base to find out.
Data quality is a problem for most marketing channels, but the world of SEO has a few challenges that are particularly frustrating. The issue of hidden clicks in Google Search Console is definitely one of these, and this was recently brought back into the spotlight in this enlightening article by Ahrefs’ Patrick Stox.
Most SEOs consider GSC their source of truth. Do you trust the data to be accurate?
We ran a study and across 146,741 websites and nearly 9 billion total clicks. Almost half of the clicks go to terms they don't show you.https://t.co/0BF0vsMzYW
— Patrick Stox (@patrickstox) June 24, 2022
Here, he reports that after a month of pulling data from more than 146k websites and nearly nine billion clicks, GSC only attributed 46% of those clicks to actual keywords. The rest of the keywords were hidden.
But what exactly does this mean, and what -most importantly – does it tell us about the way Google chooses to show us the information it collects ? Let’s take a closer look.
This isn’t a new issue. It’s been common knowledge that GSC tends to obfuscate a good chunk of the data relating to keywords and which ones are actually resulting in clicks to any given website. What’s really surprising to most people though is just how much is hidden, and how much it varies for different websites for no apparent reason.
In his article, Stox talks about how wide the range of missing data can be for various sites – 90% missing for a site with 100 million clicks, and 2% missing for one with 63 million – and on the whole, we’ve found the same:
In the visualisation above – created by our in-house Data Scientist Dr. Joshua Prettyman – each box plot represents a Blink client and the range in which their missing keyword data stretches. The bottom of the whisker (as they’re known) illustrates the minimum amount of missing data, while the top is the maximum.
As you can see, for some clients this range is huge, and it tends to go up or down month by month. In this example we can see that for this client the lowest month saw around 18% of missing click data, and at its highest nearly 60%.
Interestingly, the patterns don’t appear to be seasonal, which was our first suspicion. Year on year changes are not even slightly consistent, which is what you’d expect if this was the case. There are no other trends that immediately stand out – it all seems frustratingly random.
At this point, it may make sense to take a step back and ask, “why does this matter?” After all, how much of a difference does some missing data make? The short answer – quite a lot.
First of all, if the average site is missing 50% of their keyword data, that’s a huge blind spot. This specific data is used to show what terms are actually driving traffic to a site. This helps us understand what’s working, what isn’t and where we should focus the – not inconsiderable – effort that’s needed for an eCommerce SEO project.
Secondly, Google says that it hides data for a number of queries for things like privacy. And while we’d expect that on some scale, upwards of 50% of these queries being hidden because of privacy just isn’t feasible.
So what information is Google hiding? And why?
What does Google say about the issue?
This is all a bit vague, but Google’s overall stance is that there’s a discrepancy in the data due to privacy issues, or that the queries behind the data are only made a small number of times and don’t offer any value in terms of insight.
For the former, this is likely true – there are, on rare occasions, phrases in Search Console that are related to employee names or addresses, for example.
As for the niche, long tail keywords that would account for the rest, it also makes some sense. After all, Google has said previously that 15% of all search queries are being made for the first time.
However, there’s plenty of reasons to not take this at face value. Google has frequently been accused of trying to push marketers towards paid advertising, and the recent LinkedIn post from Rand Fishkin seems to sum up the reaction many in the industry had to the Ahrefs article.
In what was most likely a response to the noise surrounding this, Google updated its Search Console documentation earlier this month. Here, it changed its description of anonymised queries to refer to “some” queries instead of “very rare”, while also adding in a paragraph stating that only “top data rows” are stored in Search Console, as opposed to all data.
Heads-up: Google just added some new info to their Search Console performance report doc. In the ‘Anonymized Queries’ section, more details have been added about query limits related to storing “top data rows” (not all data rows) 😒https://t.co/VCEeXpiYXk pic.twitter.com/4WiPBq4mO5
— Brodie Clark (@brodieseo) July 10, 2022
This is a pretty clear shift from the previous wording, which was along the lines of “we only hide a bit of data, and that’s only stuff that doesn’t really matter.” In another tweet, Google’s search liaison Danny Sullivan summed it up as “collectively, [these terms] can be substantial…”.
Again, very rare as I read that page was a reference that individual queries can be … very rare. They happen once or twice, that's it. But as that page also said, collectively they can be substantial….
— Danny Sullivan (@dannysullivan) July 12, 2022
Missing clicks by page type
Now that we’ve got all of this out of the way, let’s get down to the interesting stuff.
What if, for example, it looks like your highest value pages are affected the most? Could all the criticism being levelled at Google be justified? Could this be a sign that it really is trying to hide information so that we’re more likely to give up on SEO and go all in on Google Ads?
Obviously this is all wild conjecture, but it’s a fun question to ask all the same. However, working to diagnose missing data isn’t easy. It’s like the joke about the urban legend that on average everyone eats eight spiders in their sleep each year, and that the way this is calculated is by counting the missing spiders in your bedroom.
The point is that you can’t count what’s not there, so it comes down to using what points of reference we do have to see if that sheds any light on the issue.
In this case, though we pretty much exclusively work with eCommerce sites, we decided to focus on looking at what types of pages are affected most of all.
Many of our projects have a standardised URL structure. For example, on Shopify stores, usually if the URL contains /collections, it’s a category, or if it contains /products it’s a product.
By filtering for blogs, categories and products across all our clients in our data warehouse, we ended up with just over 560,000 URLs. It’s obviously not the whole internet, but as a sample size goes it’s definitely enough to be statistically relevant.
Our visualisation of this can be seen below. Here the missing clicks are broken down by type instead of client, again with minimum, maximum and median values.
The striking thing here is how much tighter these variations are. Excluding “other” – which is a general bucket of pages that don’t fit into a product, category or blog – these page types all have a variance between the minimum and maximum of less than 10%. Compared to the overall data, this is quite a shift.
Next, let’s look at the median values for our three main page types:
- Product – 55.15559%
- Category – 22.66367%
- Blog – 53.73772%
So what do these numbers mean? Let’s start with blogs.
On average, 54% of clicks to blog pages in our dataset aren’t attributed to a query. But if we’re using Google’s rationale that these are going to be long tail terms that aren’t useful to report, what could that look like in the real world?
Below is an example of terms extracted from Search Console for a single blog URL. Again, we can’t see phrases that aren’t there, but what is clear is that there is some noise created by permutations and misspellings of a few distinct phrase types.
The next table shows the pages with the biggest click gap across our projects over the past 30 day period. Overwhelmingly, they’re blogs.
Because of client confidentiality, demonstrating what we can see here is a bit tricky. But diving in to the top results produced a few interesting observations:
- The blog that had the biggest clock gap ranked on average in positions 4.1, 3.6 and 4.8 for the brand names of three companies in unrelated industries. One of these has an estimated search volume of 4m per month
- The second result on our list is a blog discussing the dimensions of a particular product. Over the last 30 days it ranked for several number combinations, such “15 x 40”. Most likely these are users searching for Google’s calculator feature.
- The third was a blog that mentioned various foods, and seems to have had a brief first page ranking for short tail terms like “onion” or “potato”.
Given the broad subject matter covered in many blogs (and this one is probably no exception) it shouldn’t be surprising that things like this happen. In fact, our data showed that on average a blog page ranked for 106.25 queries in Search Console. For a category page this was 24.93 queries, and a product 13.21.
Again, this is an interesting observation. From our point of view, category pages are by far the most valuable when it comes to eCommerce SEO (this blog estimates that they get more than 400% more traffic than product pages, for example).
This traffic usually has a high intent to purchase – think “buy lawnmower” for a category page compared to “how do I mow a lawn” for a blog. This is why much of our strategy is focused on improving category pages, either by creating new ones or optimising those that already exist.
So to recap, our data shows us that on average a category page (which from an SEO point of view is very high value) has 22.7% of queries hidden compared to 53.7% for a typical blog page. However, a blog has more than four times as many ranking queries, although many of them are unlikely to ever lead to a sale.
One argument that could be made here is that Google is actually doing a reasonable job of cutting out the noise, and is actually showing us proportionally much more commercially relevant information instead of hiding it.
It’s not as exciting as uncovering a grand conspiracy, but at least it makes sense. If resources are limited – and even one of the world’s biggest tech companies doesn’t have infinite capacity – then prioritising like this seems a reasonable way to do it.
Now is a good time to insert some caveats. This isn’t a scientific study, just some observations based on what we can see from the data in front of us. Also, there’s still the issue of product pages. At 55.1%, this page type has the most queries hidden on average despite the smallest average number of ranking queries (13.21).
Here, we can see the ranking queries for a typical product page. This is a branded product available on multiple stores and follows a fairly standard pattern, with a product title that consists of brand (Imperia), product description (pasta machine) and product ID (150).
Here, the click gap is around 40%. But we can see that the page has ranked for the term “sp150”, a phrase that could mean all kinds of things to the searcher.
One theory here is that as product pages are by their nature extremely long tail and specific, the chances of them appearing for irrelevant searches are much, much higher. There simply aren’t many results for Google to show for really, really specific phrases and that means if a product listing is even tangentially related, there’s a chance it will get shown. This means that irrelevant queries are bound to slip through and be hidden from the click data.
Again, this is just a theory at the moment. Of course there’s also the possibility that Google views these pages as the most commercially valuable and is indeed deliberately hiding the data. If that were the case, it does seem that category pages would be the worst affected.
There’s a lot more digging that can be done here. We’re also not trying to say that, if this is all true, commercially important data isn’t going missing. That’s almost definitely the case – there’s always going to be instances where Google is getting it wrong.
The broader point here is that while 50% of clicks missing from Search Console may look like a shocking number, there’s probably a good chance that most of that is irrelevant. There’s likely to be some useful stuff in there for sure, but in many cases it’s going to be a better use of time and effort to focus on what data is there instead of trying to count the missing spiders in your bedroom.
Have you found your business is missing the type of data we outlined above? Get in touch today to see how Blink SEO and our data team can get to the root of the problem.
Get in touch
Have a problem that Blink can help with? Let us know more about your project below and we’ll be in contact as soon as we can.