WhatTheyThink

Premium Commentary & Analysis

How much information is there?

Once,

Friday, November 16, 2007

Once, a petabyte seemed like an enormous amount of data that no one would ever need for storage. The Library of Congress with 130 million items on 530 miles of bookshelves, including 29 million books, 2.7 million recordings, 12 million photographs, 4.8 million maps, and 58 million manuscripts only needs 10 terabytes (10,000 gigabytes).

Having had a PC XT in 1983 with a 10 MB drive and then made the incremental leaps to 20, 40, 80, 120, 500 MB and then 40, 80, 120, and 250 GB, I can see a terabyte in my future. I carry a 4 GB thumb drive around all the time.

IBM PC XT

A petabyte represents the frontier beyond the terabyte — 1,024 gigabytes equals one terabyte and 1,024 terabytes equals one petabyte. EMC launched its first petabyte array, a version of the company’s Symmetrix DMX-3 system that includes nine room-filling cabinets of drives built around 500 GB drives; the petabyte version includes 2,400 of them and goes for about $4 million. Storage marches on, and in most cases surpasses Moore's Law.

What if you wanted to digitize everything we hear, see, and say:

    * There were 4,615 films last year; at 5 MB/sec and 7,200 seconds average, that would be 166 terabytes. Some movies are eschewing film and going directly to digital.
    * There are about 52 billion photographs taken each year in the world. If each of those is a 10 KB JPEG, that comes 520,000 terabytes, or 520 petabytes. But resolution levels are increasing, as are the number of digital photos taken, so the number could be many times higher, even with compression.
    * We have 1,593 television stations. If each sends out 5 MB/sec for 30 million seconds per year, that is over 200 petabytes. High definition will increase that volume.
    * Sales of recorded music were 407 million CDs and 336 million cassettes (and 10 million vinyl disks, still). Assuming 550 MB for each CD and cassette that would be 400 petabytes, much duplicated of course. If the number of different recordings for sale is about 30,000 this would be 60 terabytes worldwide. The music industry is in the middle of the iPod revolution as it converts to a digital world.
    * The largest storage requirement would come from converting all telephone conversations to digital form. 500 billion call-minutes would be 4,000 petabytes of digitized voice. Cell phones have increased the volume of calls, so the number is growing.
    * The Web has been growing 10-fold each year. Current estimates claim Internet users at over one billion and probably in the high petabytes for all websites and email.

What is in a...

Bit
1 bit: Yes or No, On or Off, Zero or One
Byte (8 Bits)
1 byte: A single character, symbol, glyph
Kilobyte (1024 Bytes)
1 Kilobyte = One page of text
Megabyte (1024 Kilobyte)
1 Megabyte: A small book of text on a 3.5 inch floppy disk
Gigabyte (1024 Megabyte)
1 Gigabyte: A movie at TV quality
Terabyte (1024 Gigabyte)
1 Terabyte: 50,000 trees made into paper and printed
Petabyte (1024 Terabyte)
1 Petabyte: 2 years of earth satellite data
Exabyte (1024 Petabyte)
1 Exabytes: All words ever spoken by human beings (grunts not counted)
Zettabyte (1024 Exabyte)
1 Zettabyte: Everything that there is
Yottabyte (1024 Zettabyte or a 1 followed by 24 zeros --
1,000,000,000,000,000,000,000,000 -- bytes)
1 Yottabyte: Everything that there will be
1 ??? = 1024 Yottabyte
We need a name for the next level. Humungabyte? Tyranobyte? Suggestions?

Read the How Much Information? Executive summary for a year 2000 attempt to figure it out.

We are now able to save in digital form everything visible, audible, or communicated. Today this memory is highly duplicative, with billions of copies of popular music, programs, files, images, etc. stored in different places for different reasons. Tomorrow, with everyone online with high-speed connections, and extended use of site license agreements, it may be common for PCs and PDAs and cell phones to fetch anything from anywhere and thus eliminate local storage.

Search engines have changed the way we do research. If Google has its way, every book and magazine article and more will be searchable. How many of you remember when research meant wallowing through Wilson Readers' Guide to Periodical Literature and then having to find the actual printed magazine.

With 6 billion people on earth, that makes the total memory of all the people now alive about 1,200 petabytes. I can state personally that as one ages that memory decreases. We may be able to store digitally everything that every­one remembers. Professor Landauer estimated that people only take in and remember about a byte a second; a typical lifetime is 25,000 days or 2 billion seconds (counting time asleep). The result is 2 gigabytes, or something that fits on a thumb drive. (Some folks can use a floppy disk but I carry a 4 GB thumb drive.)

Plato suggested that writing would "create forgetfulness in the minds of those who learn to use it." Why memorize something if you can find it in hard or soft copy form. There will be enough disk space and tape storage in the world to store everything people create, write, say, perform, or photograph. When we reach a world in which the average piece of information is never looked at by a human, we will need to know how to evaluate everything automatically to decide what should get the precious resource of human attention.

Today the digital library community spends some effort on scanning, compression, and metadata; tomorrow it will have to focus almost exclusively on selection, searching, and quality assessment. Input will not matter as much as relevant choice. Missing information won’t be on the tip of your tongue; it will be on the tip of your bits. In a digital world, the challenge will be finding that which is relevant.

So, how much information is there? A few thousand petabytes, give or take a terabyte.


Continue reading your article
with a WhatTheyThink membership.

WhatTheyThink Annual Membership

Less than $4/week.

Get unlimited access to in-depth commentary and analysis covering the latest trends, emerging technologies, operational strategies, and key events across every segment of today's printing industry.

Stay informed. Stay competitive. Stay ahead.
WhatTheyThink Day Pass

$5 for 24 hours

Unlimited access to all of WhatTheyThink. Get your Day Pass

Already a member?
Sign In

About Frank Romano

Frank Romano has spent over 60 years in the printing and publishing industries. Many know him best as the editor of the International Paper Pocket Pal or from the hundreds of articles he has written for publications from North America and Europe to the Middle East to Asia and Australia. Romano lectures extensively, having addressed virtually every club, association, group, and professional organization at one time or another. He is one of the industry's foremost keynote speakers. He continues to teach courses at RIT and other universities and works with students on unique research projects.

Recent Articles from Frank Romano

Confessions of a Former Typographer

Confessions of a Former Typographer

Frank interviews Bob Wislocky, whose typesetting business weathered hot metal, phototypesetting, electronic imaging, digital printing, and other production technologies over 90 years. Read More

Romano a Mano

Romano a Mano

Frank and Richard share a sofa and some banter about the printing industry. Read More

Frank Takes the Pledge

Frank Takes the Pledge

Frank talks about The Youth’s Companion, a newspaper published in Boston for over 100 years. In 1892, its editor proposed a Pledge of Allegiance. Read More

The Font I Want

The Font I Want

Frank describes his informal survey to discover the most-used typefaces. Over a decade, he has asked users what font they use most often. See the results. Read More

Jeopardy in Jeopardy

Jeopardy in Jeopardy

Frank reacts to a Jeopardy game show segment that involves Johann Gutenberg. There is much misinformation about the invention of printing and Frank is on a mission to present the facts, even if it means yelling at a TV screen. Read More