The World is (Information) Fat

“Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.”
— Samuel Johnson quoted in Boswell’s “Life of Johnson.”

If you come to think about it for a moment, what we really want is knowledge, not information. (Recall what the business school guru said: what people want is not a quarter-inch drill but rather a quarter-inch hole.) The good news is that there is a lot of information out there. The better news is that the cost of accessing that information has been dropping exponentially. But the bad news is that the cost of searching through the vast stock of information to satisfy your knowledge needs is increasing.

First a brief aside on the distinction between knowledge and information. People use the terms interchangeably but they must be distinguished if we wish to reason with some degree of clarity about our information-suffused modern society. A telephone book has information about names and numbers, but it does not ‘know’ telephone numbers. A human brain ‘knows’ a phone number, in contrast. Outside the human brain, it is information and organized within the structures of a human mind, it is knowledge. Dr Johnson appreciated the distinction very keenly. Information is what economists call a public good, while knowledge is a private good. This has important implications which need not detain us right now.

Back to the good news about information. Here are some fun facts about information.

Fun Fact #1: There is a heck a lot of information in the world today. The stock of information is stupendous. Naturally so because human activity primarily produces information. The increased production of information has been intensive and extensive.

Intensive because more of what we do is recorded, whether in government files, in private sector databases, or in your own record of your private life such as in blogs and photo albums, and so on. You have lots more bits of information associated with you than was associated with your ancestors.

Extensive because first there are more of us on the planet today, and second, because more of us are doing stuff that produces more information. A much larger percentage of our six billion human population is engaged in generating and processing information than before. Millions of researchers and scientists do stuff that generates truck-loads of information. Commercial, governmental, and personal information grows unbounded.

Fun Fact #2: A heck of a lot the available information is available online. Some of it is on the world wide web accessible through the internet, while much of it is in the deeper web not generally accessible to the average web surfer. As time goes on, a greater percentage of the total stock of information will be online.

Fun Fact #3: The stock of information is increasing exponentially. And consequently, the stock of online information is also increasing exponentially. Exponential increases are fairly dangerous things. This is a not-so-much fun fact we will come back to bite us. Flow (increase in the stock) of information is threatening to become a tsunami.

Fun Fact #4: Regarding the quality of information online: as the quantity is increasing, the variance is increasing and the average is decreasing.

Let’s just focus on books although we could as easily tell a similar story about movies, or books, or research papers, or photographs, or blogs, or usenet postings. A century ago, there were very few books published compared to today. Given the high cost barrier of publishing, only those works which had some enduring quality made the grade. Therefore the average quality of available printed matter was high. Today, there millions of titles are published because of both greater supply (more writers) and greater demand (more readers), and because the cost of publishing (relative to average incomes) has fallen. The quality of the average book, I believe, is lower than before.

It is my contention that the best book of today is better than the best book of yesterday, and that the worst book of today is worse than the worst book of yesterday. Just a hunch and I don’t have hard data to support this hunch.

Allow me to make a hand-waving argument about quality and quantity. Let’s take photographs. When I used an analog camera (print or slides), I used to take a lot fewer pictures than I do today with my digital camera. But I rejected a lot lower percentage of pictures in those analog days. Today, I throw away most of what I take with my digital camera but I end up with much higher quality “best pictures” than before. Taken as a whole, my digital pictures are on average “lower quality” because I click a lot of pictures with greater abandon today given the low average cost of each click.

Greater variance and lower average quality coupled with an immensely larger stock leads us to the bad news about today’s information age. But first, a few more fun facts.

Fun Fact #5: The cost of accessing information is going down.

Let’s just say “google.” Enter some keywords and you will get about four million hits, give or take a few million. Marginal cost to you: nearly zero (assuming that you have a connected computer at your disposal.)

Another way of putting this is to say that the “channel capacity” has increased. The information can flow through to you through a vast pipe if you need it.

Fun Fact #6: The cost of identifying the information is going up.

So you do get four millions hits in less than 0.4 seconds when you do that search on Google. But, unless you are very lucky, or have been very clever in specifying the search, it will take you a lot of time to sort through it all to find the information you need.

For any given stock of information, the lower the average cost of accessing the information, the higher the search cost for any specific required information. Here I would like to enunciate what I call the Information cost complementarity principle: for a given cost, quality and quantity are complementary. You can have high quantity but will have to put up with low quality; or you can have high quality but it would cost you. (Compare with the folk wisdom: “Good, fast, cheap: Choose two.”) Another way of putting it: the cost of searching and cost of sorting through the results of the search are complementary.

I model the principle after Neils Bohr, the father of quantum mechanics. He stressed the importance of the complementarity principle and held that it has wide applicability in areas far removed from physics. The principle says that knowledge of one aspect of a system precludes knowledge of certain other aspects of the system. (Ref: Steven Weinberg’s Dreams of a Final Theory, pg. 74.) Heisenberg’s uncertainty principle is an instance of the complementarity principle: absolute knowledge of a particle’s position (or momentum) precludes absolute knowledge of the particle’s momentum (or position.)

Fun Fact #7: The cost, therefore, of obtaining knowledge has gone up.

Recall that we really are not interested in information for its own sake; we are interested in knowledge. Being given four million hits in return for a search is about as helpful as being thrown both ends of a rope when one is drowning.

The claim that the cost of knowledge has gone up simultaneously with a dramatic decrease in the cost of information is clearly counter-intuitive. Counter-intuitive but consistent with the facts. Low quality information is cheap. That leads to what I call “information obesity” and is inconsistent with “knowledge health.”

For all of our evolutionary history, food was not easily available and so we have evolved to subsist on a low calorie diet. Suddenly (in evolutionary time-scales) in developed countries, calories are abundant because low quality high-calorie foods are cheap. High quality low-calorie foods are expensive. So the poor in rich nations such as the US suffer differentially more than the rich from obesity. This is in contrast to poor nations where the rich are obese. Obese people are calorie-rich but health-poor.

I contend that one can be information-rich and knowledge-poor. And further that in an information overloaded society, the poor people will be information-rich and knowledge-poor, and the rich people will be information-poor but knowledge-rich.

After all this talk, let me come to the point that I want to make: There is a big opportunity managing information overload. Create a filter which will let only the top quality information through and people will beat a path to your door. You may say that it is a Super Filter which filters out not just spam but low quality non-spam content as well. In the past, portals which gave you everything (when there was not very much of anything) were a big hit. Now (when there is much too much of everything) portals which give you access to absolutely selective exclusive stuff will make it big.

In one specific area — education — I have figured out how to manage the information overload problem. Perhaps we should talk about it later. Or perhaps not.

Follow-up Post: The Age of Superfluous Information and The World is Information Fat Followup.

Author: Atanu Dey

Economist. View all posts by Atanu Dey

As usual, a thought provoking article that makes me periodically check this blog!

I would like to question the validity/implications of 4, 6, and 7. As we all know search efficacy has been taken to new levels by Google; and Inktomi/Yahoo! and MSN are not slacking off either. I wonder if you are implicitly questioning the limitations of the ranking algorithms used by these companies. As the amount of information increases by leaps and bounds, the total quantum of search results for a particular phrase will keep increasing proportionally. But I would suppose the key challenge is to rank order the search result — after all, how many people go beyond the first few pages of any search result?

Put another way, would their ever be a real need to sift through information that is ranked beyond the top 50, or, would a person be better off refining the query to zero-in more effectively on what was sought. I believe it is the latter; hence the amount of information should not affect the quality of refined queries. But yes, refined queries will be increasingly needed to substitute coarse and aggregate ones.

I contend that this is an “intellectual” cost. It will be simplifed for the masses if leading edge companies can provide intuitive means to elicit query refinement. i.e., an important question is how the search companies could provide artifacts beyond the existing ones to allow users to phrase their queries better and obtain exactly what they seek. Google for one has done tremendously well on such aspects. Google suggest completes search phrases before they are completed (offering suggestions), the “:” prefixes are extremely handy and work very effectively, the seamless carry over of context to news, images, local etc. are so convenient. I don’t know if I am missing your point Atanu, but I reckon that bright engineers have hit upon very good metaphors to enable searching for exactly what you seek. I grant that these tools can be expanded upon, and are taken for granted by the programming savvy; if and how the larger population can effortlessly imbibe these skills depends on the user interface exposed and the rate of exposure (so as to not intimidate the consumer base).

Another question is whether people in general have communication skills to express exactly what they seek. It is very instructive to look at some of the results on Google Answers and look at what the experts searched on to provide the answer. People seem to be all over the spectrum when it comes to the ability to express their needs smartly and succinctly. As with most things in life, yet again, we realize the 80-20 rule 🙂

Finally, would you have any stats or thoughts on what proportion of the growth of the information is on largely “new” topics and how much represents “accretive” bloat? This is surely an ill-formed question because the definition of ‘new’ and ‘accretive’, in this context, are themselves fuzzy.

[Disclaimer: I don’t work for a search company, and may be demonstrating gross ignorance of progress in the area].

LikeLike

9 thoughts on “The World is (Information) Fat”

TTG says:

June 7, 2005 at 4:54 pm

Hey,
managing the information load is already being done by small sections, using natural language processing and collaborative filtering, but both these are currently used in specialised areas, and are still pretty crude at best. One of the REAL reasons that Amazon.com is successful is its attempt to build a”community” and an attempt to manage the information – “Customers who bought X also bought Y”. One of the projects in my former university had to with both Natural Language Processing and Collaborative Filtering too. The conculsion they came uip with is that it’s really difficult to do, even with an Ivy-League’s computer labs at your disposal!

LikeLike
pardeshi says:

June 8, 2005 at 11:37 pm

Dear Atanu,
I wouldn’t share your hunch about books ; are the classicals of today are better than those of the past? What is the defintion of the best book.

As a citizen lambda, I depend on the commentary of literature critics taking that I have confidence in their writings.For the rest , one relies on collaborative filtering as TTG puts it.I am waiting for the heaven sent ultimate Filter.
Anyway it’s good to hear about Niels Bohr and Heisenberg.

LikeLike
Uday says:

June 13, 2005 at 10:35 am

As usual, a thought provoking article that makes me periodically check this blog!

I would like to question the validity/implications of 4, 6, and 7. As we all know search efficacy has been taken to new levels by Google; and Inktomi/Yahoo! and MSN are not slacking off either. I wonder if you are implicitly questioning the limitations of the ranking algorithms used by these companies. As the amount of information increases by leaps and bounds, the total quantum of search results for a particular phrase will keep increasing proportionally. But I would suppose the key challenge is to rank order the search result — after all, how many people go beyond the first few pages of any search result?

Put another way, would their ever be a real need to sift through information that is ranked beyond the top 50, or, would a person be better off refining the query to zero-in more effectively on what was sought. I believe it is the latter; hence the amount of information should not affect the quality of refined queries. But yes, refined queries will be increasingly needed to substitute coarse and aggregate ones.

I contend that this is an “intellectual” cost. It will be simplifed for the masses if leading edge companies can provide intuitive means to elicit query refinement. i.e., an important question is how the search companies could provide artifacts beyond the existing ones to allow users to phrase their queries better and obtain exactly what they seek. Google for one has done tremendously well on such aspects. Google suggest completes search phrases before they are completed (offering suggestions), the “:” prefixes are extremely handy and work very effectively, the seamless carry over of context to news, images, local etc. are so convenient. I don’t know if I am missing your point Atanu, but I reckon that bright engineers have hit upon very good metaphors to enable searching for exactly what you seek. I grant that these tools can be expanded upon, and are taken for granted by the programming savvy; if and how the larger population can effortlessly imbibe these skills depends on the user interface exposed and the rate of exposure (so as to not intimidate the consumer base).

Another question is whether people in general have communication skills to express exactly what they seek. It is very instructive to look at some of the results on Google Answers and look at what the experts searched on to provide the answer. People seem to be all over the spectrum when it comes to the ability to express their needs smartly and succinctly. As with most things in life, yet again, we realize the 80-20 rule 🙂

Finally, would you have any stats or thoughts on what proportion of the growth of the information is on largely “new” topics and how much represents “accretive” bloat? This is surely an ill-formed question because the definition of ‘new’ and ‘accretive’, in this context, are themselves fuzzy.

[Disclaimer: I don’t work for a search company, and may be demonstrating gross ignorance of progress in the area].

LikeLike
Taran says:

June 19, 2005 at 10:46 am

It’s also no mistake that the most popular information is not the most accurate. The more popular information has less discriminating audiences.

Consult the book of Armaments! (sorry. just popped out)

LikeLike
Pingback: Atanu Dey on India’s Development » The World is (Information) Fat: Followup
Stephen Stohs says:

July 24, 2005 at 8:28 am

Discussing information overload in education creates the risk of adding to the extant overload. So let’s talk about Google’s share prices instead.

Google sells ads on the basis of how many eyeballs it attracts, but if most owners of said eyeballs never buy anything mentioned in those ads, then is a $300+ stock price for Google shares sustainable?

LikeLike
Pingback: Atanu Dey on India’s Development » The World is (Information) Fat: Followup
Pingback: Atanu Dey on India’s Development » The Age of Superfluous Information
Pingback: Atanu Dey on India’s Development » Information Overload

Comments are closed.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Share this:

Author: Atanu Dey

9 thoughts on “The World is (Information) Fat”