In my personnel dictionary, I have defined google as “Excellence in Engineering”. I can’t express my admiration for their products, engineering practices & innovation culture. In such admired google, I uncovered a silly bug, that too in their core search engine product. Are you surprised? Me too.
I recently launched my pet project http://gceasy.io/ – as a free online tool. This tool parses JVM’s Garbage collection logs & publishes interactive graphical visualizations to it.
I was curious to see how search results are rendered for this tool. So I googled for the term “gceasy.io”. Note: I am searching for “gceasy.io” (i.e. with double quote) instead of just gceasy.io. When searched with double quote, google will do exact word match search.
As always Google displayed the search results in matter of milliseconds. At the bottom of the screen, in the pagination section it was mentioned that there are 3 pages of search results. Please refer to below Figure 1.
Figure 1: Page #1 of Google Search Results for “gceasy.io”. Pagination section showing 3 pages.
For a newly launched tool, I was motivated to see 3 pages of search results. So after quickly skimming through the results in first page, I clicked on #2 hyperlink in the pagination section. Now I was taken to second page of the search results. After reading through second page, I was looking for page #3 and “next” hyperlinks in the pagination section. To my surprise those hyperlinks were missing. Oh, wow. Please refer to below Figure 2.
Fig 2: Page #2 of Google Search Results for “gceasy.io”. Note in the pagination section page #3 and ‘Next’ hyperlinks are missing.
I can’t believe it. As I am a avid follower of Google Test automation conferences (GTAC), their SET (Software Engineer in Test) practices, their Google testing blog sites, text books they publish on test engineering – I was surprised by this silly bug, that too in their core product. May be old Indian saying – “Even Elephant might loose its grip while walking” is true.
Benefit of doubt
In page #2, above the pagination section it was mentioned that: “In order to show you the most relevant results, we have omitted some entries very similar to the 14 already displayed. If you like, you can repeat the search with the omitted results included.”
So I thought if I am going to click on the hyperlink “repeat the search with the omitted results included“, I might see 3 pages of search results. To give benefit of doubt, when I clicked on this hyperlink, I was seeing 5 pages of search results instead of originally mentioned 3 pages. Please refer to below Figure 3.
Fig 3: 5 pages of search results being shown when clicked on the hyperlink “repeat the search with the omitted results included“
This is not a new thing, in fact I recall Google having been doing that for more than 10 years now.
I don’t remember the rationale for it, but I seem to remember it’s the same as for the GMail email count when you browse your labels or a search result. It’s an estimation done ahead of time that is cached, and that may or may not match the actual results.
So, it is indeed a wonder of engineering, but a confusing one.
Just like the number of search results itself is now grossly estimated / rounded, if you will.
This is a known factor of search engine technology. In a distributed system, the cost of sorting results grows exponentially the deeper we page. You can see from an “off the shelf” search product Lucene, how they handle pagination is similar:
“Do not retrieve all documents if you actually need to work only with some portion of them”
They actually have properties that help decide either how accurate you want your page numbers vs. how fast you want your results. Typically you want to maximize the speed at the cost of some incorrect page numbers.
May be this is all to get more hits on the website mentioned in the picture. Just a guess!
I am having the same problem with Google ommitting some of my pages, i don’t know how to solve the problem.