- Joined
- Jun 26, 2007
- Messages
- 26,190
- Reaction score
- 6
- Points
- 38
- Age
- 62
The search giants' crawlers - the small programs that comb the web in order for it to be accessible to search engines - have hit their 1,000,000,000,000th unique URL, although this doesn't take into account millions more sites that are not linked or deliberately exclude themselves from search engines.
Of course - 1 trillion unique URLs does not mean 1 trillion unique web pages, with original content much harder to come by.
Replication
"Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other, explains the Google blog.
"Even after removing those exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day."
How many unique pages?
The fascinating blog post, by by Jesse Alpert & Nissan Hajaj also acknowledges that there are, technically, infinite unique pages, but that indexing them would be pointless.
"So how many unique pages does the web really contain? We don't know; we don't have time to look at them all! :-)
"Strictly speaking, the number of pages out there is infinite -- for example, web calendars may have a "next day" link, and we could follow that link forever, each time finding a "new" page.
"We're not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what's a useful page, and there is no exact answer."
More...