One of the rewards of writing a blog is the occasional detailed comment that readers (yes, this blog has more than one casual reader) send in. One such comment was elicited by my earlier post “The World is (Information) Fat.”
Uday wrote in:
As usual, a thought provoking article that makes me periodically check this blog!
Now don’t go thinking that I bribed Uday into writing that.
I would like to question the validity/implications of 4, 6, and 7.!
For reference, here are those points:
Fun Fact #4: Regarding the quality of information online: as the quantity is increasing, the variance is increasing and the average is decreasing.
Fun Fact #6: The cost of identifying the information is going up.
Fun Fact #7: The cost, therefore, of obtaining knowledge has gone up.
As we all know search efficacy has been taken to new levels by Google; and Inktomi/Yahoo! and MSN are not slacking off either. I wonder if you are implicitly questioning the limitations of the ranking algorithms used by these companies. As the amount of information increases by leaps and bounds, the total quantum of search results for a particular phrase will keep increasing proportionally. But I would suppose the key challenge is to rank order the search result — after all, how many people go beyond the first few pages of any search result?
My claim in #4 is restricted to the average quality and the total quantity of information available, with the former coming down and the latter increasing exponentially. Naturally the good stuff is harder to find. Even if the search is aided by Google with their admittedly excellent page ranking algorithm, the results can be less satisfactory than if the search domain was restricted as it was earlier when there was less information overall. My unsubstantiated claim is that the combined effect on the average quality of information of increased volume (negative) and more efficient search engines (positive) is on the whole negative.
Put another way, would their ever be a real need to sift through information that is ranked beyond the top 50, or, would a person be better off refining the query to zero-in more effectively on what was sought. I believe it is the latter; hence the amount of information should not affect the quality of refined queries. But yes, refined queries will be increasingly needed to substitute coarse and aggregate ones.
I agree that a refined search would yield a more accurate result, of course. But that refining the search is exactly what I mean when I say that the search costs increases.
The explosion of consumer choice is a good thing, overall. But there are implicit costs associated with choosing from a large menu as opposed to a limited but excellent menu. The implicit costs include a more discerning consumer.
I contend that this is an “intellectual” cost. It will be simplified for the masses if leading edge companies can provide intuitive means to elicit query refinement. i.e., an important question is how the search companies could provide artifacts beyond the existing ones to allow users to phrase their queries better and obtain exactly what they seek. Google for one has done tremendously well on such aspects. Google suggest completes search phrases before they are completed (offering suggestions), the “:” prefixes are extremely handy and work very effectively, the seamless carry over of context to news, images, local etc. are so convenient. I don’t know if I am missing your point Atanu, but I reckon that bright engineers have hit upon very good metaphors to enable searching for exactly what you seek. I grant that these tools can be expanded upon, and are taken for granted by the programming savvy; if and how the larger population can effortlessly imbibe these skills depends on !
the user interface exposed and the rate of exposure (so as to not intimidate the consumer base).
I don’t think you are missing my point. We are merely working on different sets of assumptions. The object of a search is to find something that you by definition do not know. Therefore the best you can do is to define the boundaries of the search. Having defined the boundary, all else remaining the same, the denser the information set, the more numerous will be the result obtained.
Another question is whether people in general have communication skills to express exactly what they seek. It is very instructive to look at some of the results on Google Answers and look at what the experts searched on to provide the answer. People seem to be all over the spectrum when it comes to the ability to express their needs smartly and succinctly. As with most things in life, yet again, we realize the 80-20 rule J
In my conjectures in the piece, I was working with averages. My contention is that the sophistication of the average consumer may have not improved significantly, and that the variance is increased. Therefore the conjecture that on average, finding information is more costly now.
Finally, would you have any stats or thoughts on what proportion of the growth of the information is on largely “new” topics and how much represents “accretive” bloat? This is surely an ill-formed question because the definition of ‘new’ and ‘accretive’, in this context, are themselves fuzzy.
Excellent point. New topics would have lower aggregate information and therefore will have greater average quality and lower variance. They would not have time to have what you call accretive bloat. I think there may be some equivalent of Zipf’s Law (Rank-size distribution) which may shed some light on how the growth of information is distributed among topics that are new as opposed to old.
Imagine that each topic is ranked from the most recent to the most ancient. Associate with each topic the total amount of information available and a bit of regression analysis will yield the exact law.
[Disclaimer: I don’t work for a search company, and may be demonstrating gross ignorance of progress in the area].
Hey, my ignorance will leave your ignorance in the dust.