The Aggregation Paradox, Part 2 - Yaron Galai's Blog

In a previous post I outlined the Aggregation Paradox and its ramifications on a variety of online businesses.

A curious fact is that one category of web companies has been able to completely avoid being aggregated. It’s really quite fascinating, because this category includes the biggest aggregators of them all – the search engines.

Theoretically, a company with very little infrastructure could crawl Google’s search result pages (SERP’s) and provide an almost identical service to Google sans the costs involved with crawling and indexing the entire web. Moreover, the company could scrape and aggregate results from multiple search engines and provide a superior user experience with a fraction of the cost.

I know – this all sounds like the good old meta-search engines which have been around since, well, just about since search engines have been around. But meta-search never really took off, while aggregation in so many other categories has in a big way. Why is that? True – there are some technological barriers that make search trickier to aggregate than some of other categories (size of the index, dynamic blurbs, etc). But those are all solvable. So what really makes search aggregation different?

Simple – The search engines, being the mothers of all aggregators, know better than anyone else the Aggregation Paradox and the long term risks it imposes and those who get aggregated.

The engines have all jumped through hoops over the years to prevent anyone from aggregating them in a serious way. They’ve placed legal, economic and technical barriers to prevent aggregation. Here, for example, is a piece from Google’s search T&C’s:

No Automated Querying
You may not send automated queries of any sort to Google’s system without express permission[1] in advance from Google. Note that "sending automated queries" includes, among other things:

using any software which sends queries to Google to determine how a website or webpage "ranks" on Google for various queries;

"meta-searching" Google; and

performing "offline" searches on Google.

Please do not write to Google to request permission to "meta-search" Google for a research project, as such requests will not be granted.

Kind of an interesting policy for a company that’s built entirely on NOT applying ANY of these restrictions to the sites that it crawls and aggregates. BTW – I can personally testify that this is probably one of Google’s most enforced terms. Try to aggregate a few search result pages, and Google’s legal team will come after you within minutes.

How do the search engines get away with this? Why isn’t anyone calling them out?! The answer lies of course in the Aggregation Paradox, and in who gets it (Google, Yahoo, etc) and who doesn’t (mostly everyone else that’s addicted to the short term traffic…).

More on the Aggregation Paradox, and possible solutions to it, soon.

[1] "Express permission" = paying Google so much per query that a meta-search will find it very hard to earn anything…

Share this post!

Leave a ReplyCancel reply