May 15, 2006

10 Things You Might Not Know About Google

Philipp Lenssen

This article is written by Philipp Lenssen as part of the Blog Swap with Seth Finkelstein – Seth's article on 10 Things You Might Not Know About Censorware can be found at Philipp's blog.

Blog Swap

1. Google query syntax underwent some subtle changes over the years.

Not too long ago, you couldn't enter more than 10 words into the Google search box. Or to be more precisely, you *could*, but subsequent words were ignored. I bet the Google founders were thinking "10 words ought to be enough for everyone," and mostly there were right – but for some advanced uses, especially with the Google Search API, a little more is helpful. Then, a while ago, Google increased the words limit to 32 words. This is probably OK for a few more years!

Another change is that Google ignores stop words nowadays. Stop words in search engines are words like "the" or "a" which are too tiny or common to be useful additions to most searches. However, Google is now accepting them as semi-normal words (one remaining difference being that they're not highlighted, or linked to the dictionary). This means in Google.com, you get different results when search for [the tale of a cowboy] vs [* tale * * cowboy] vs [tale cowboy]. (I'll be using square brackets around search queries – they're not to be included in the search.)

Another operator changed its functionality during the years; a couple of years ago, you could only query Google for [site:something.com], but not [site:something.com/something/]. Today, you can add folders to the site operator.

2. Google itself was Beta.

These days, everyone puts a Beta tag on their 2.0-ish web app. But did you know back in 1998, when Google launched their search, it was also in Beta? Take a look at a copy stored in the WayBack Machine to see it. Be aware the page might look quite ugly by today's standards... heck, it was probably ugly even back in 1998 (then again, so was my homepage in 1998!).

3. PageRank more than 1-10 – maybe.

While no one outside Google knows for sure, it is often speculated that Google's PageRank value – the "authority rank" (or quantity of backlinks which themselves receive lots of backlinks) – is a much more precise number than the plain 1, 2, 3... 10 values. A float, not an integer, if you will.

So, for example, if you're looking at a site which shows a PageRank 8 in the Google Toolbar, its internal PageRank may be something like 8.355 (or however precise Google's number is). But we don't know for sure – maybe Google's algorithms prefer speed over quality when it comes to the recursive PR calculations of billions of pages. This calculation might not be a breeze even for Google's 10,000 - 200,000 computers (that's another number we can't be too sure of outside of Google).

4. Google's co-founders didn't like each other in the beginning.

I guess when you're an uber-geek, like Google founders Larry Page and Sergey Brin are, you are also very competitive (to the point of risk being arrogant towards slower thinkers, maybe). John Battelle in his book The Search (page 67/68), tells of how the two met at Stanford University in the summer of '95:

Like most schools, Stanford invites potential recruits to the campus for a tour. But it wasn't on the pastoral campus that Page met Brin – it was on the streets of San Francisco. Brin, a second-year student known to be gregarious, had signed up to be a student guide of sorts. His role that day was to show a group of prospective first-years around the City by the Bay.

Page ended up in Brin's group, but it wasn't exactly love at first sight. "Sergey is pretty social; he likes meeting people." Page recalls, contrasting that quality with his own reticence. "I thought he was pretty obnoxious. He had really strong opinions about things, and I guess I did, too."

"We both found each other obnoxious," Brin counters when I tell him of Page's response. "But we say it a little bit jokingly. Obviously we spent a lot of time talking to each other, so there was something there. We had a kind of bantering thing going."

5. Google has 16 official blogs.

You might have come across the official Google Blog. But did you know Google actually has 16 different – and all official – blogs (give or take one)? Here's the full list (I'm also collecting these all on one page):

  1. Google Blog - googleblog.blogspot.com
  2. Google Talkabout - googletalk.blogspot.com
  3. Google Base Blog - googlebase.blogspot.com
  4. Google Video - googlevideo.blogspot.com
  5. Inside Google Desktop - googledesktop.blogspot.com
  6. Google Code - code.google.com
  7. Inside AdWords - adwords.blogspot.com
  8. Inside AdSense - adsense.blogspot.com
  9. Google Reader Blog - googlereader.blogspot.com
  10. Blogger Buzz - buzz.blogger.com
  11. AdWords API Blog - adwordsapi.blogspot.com
  12. Google Enterprise Blog - googleenterprise.blogspot.com
  13. Google Research - googleresearch.blogspot.com
  14. Google Maps API Blog - googlemapsapi.blogspot.com
  15. Google Writely - writely.blogspot.com
  16. Inside Google Book Search - booksearch.blogspot.com

6. Google self-censors in several countries.

You heard about how Google self-censors in China (e.g. human rights sites top-ranked by Google in other countries are missing in Google.cn). But did you know that Google showed censored search results in other countries for years, sometimes even without showing a disclaimer that something was missing? In Germany and France, that was the case.

You can see this for yourself if you first search Google.com for [site:ety.com]. This will result in 9,940 results. Now if you do the same search on Google.fr – Google France – you get zero results. However, there's a disclaimer at the bottom:

"In response to a legal request submitted to Google, we have removed 260 result(s) from this page. If you wish, you may read more about the request at ChillingEffects.org."

Note Google's disclaimer is showing the wrong number of missing pages – it 1,000s, not 260. Following the link to Chilling Effects, we see this text:

Google received complaints prior to March 2005 about URLs that are alleged to be illegal under U.S. or local law. In response to these complaints, one or more URLs that would have appeared for this search were not displayed.

In other words, Google is not censoring this out of their own belief, but by following government requests. Now what's ety.com anyway, except being one of the many censored domains? A quick glance will show it's some kind of stupid Nazi propaganda site, illegal by some country's standards. But you know what Voltaire said... "I may disagree with what you say, but I will defend to the death your right to say it."

7. Google stopped counting their index size.

Since around 2001, Google on their front-page were proud to show off the number of pages they search through... a number that went from a billion and a half to over 8 billion (according to Google). Today, Google doesn't show this number anymore. Maybe Googlers – that's what Google employees are called – realized that results quality beats results quantity. Or maybe they just realized that by sheer numbers, competitors were winning. In August 2005, Yahoo in their blog announced:

As it turns out we have grown our index and just reached a significant milestone at Yahoo! Search – our index now provides access to over 20 billion items (...) [including] over 19.2 billion web documents

Today, when you want to find out about the Google index size, there's a workaround though: search Google for ["* *"] – that's a good estimate. Right now, it's displaying 25,270,000,000 pages. In a direct comparison, when we search for "the" on both Google and Yahoo, Google shows a couple of billion pages more. Then again, these numbers are hard to verify – Google only lets us see the first 1000 results for each query. And in the end, who wants to see more than that anyway? Most people don't even go beyond the first 10 results, and rather adjust their search query instead!

8. The Google API may offer over 1,000 requests.

If you're a developer utilizing the Google web search API, and you need way beyond the 1,000 requests per day Google offers by default, here's a tip: you can email the Google API support and request more hits for your API key. Depending on your projects and traffic needs, which you will have to outline, Google just might grant you the request!

9. Google comic book search.

While Google doesn't have its own comic book search engine, you can still achieve good results by going to Google Images, setting the file size to "Large images", and then searching for [comics]. Using this setting, you can also search for an artist's name, like ["john byrne"], ["john romita jr"], ["frank miller"] or ["daniel clowes"]. You might even have some fun adding your own speech bubbles to the comic book pages you find (use a free font like WebLetterer for best results)...

10. Google Writely is a multi-user chat.

OK, so Writely – which Google recently acquired – is not really a chat, but an online word processor. However, by inviting others to your Writely document, you can group-edit any document... and see the changes by others merged into the document as you type! This feature allows you to chat with a group, and you can have fun with positioning text on different places on the screen, wiki-editing what others wrote, or adding colors and images.

By Seth Finkelstein | posted in google | on May 15, 2006 10:26 AM (Infothought permalink)
Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Subscribe with Bloglines      Subscribe in NewsGator Online  Google Reader or Homepage

Comments

Some of these things are pretty basic, but a few were wicked interesting.
Thanks for sharing!

Posted by: Shuai King at May 15, 2006 03:24 PM

Hi Philipp and Seth-

A few more official Google blogs. (Point 5 above).

http://sitemaps.blogspot.com/ - Google SiteMaps Blog
http://googlechinablog.com/ - Google China Blog
http://googleitalia.blogspot.com/ - Google Italy Blog
http://googlejapan.blogspot.com/ - Google Japan Blog
http://googlekoreablog.blogspot.com/ - Google Korea Blog
http://googlemexicoblog.blogspot.com/ - Google Mexico Blog
http://adsense-de.blogspot.com/ - Google AdSense German Blog

That makes it 23 official blogs for Google.

Regards,
Chirayu

Posted by: Chirayu at May 15, 2006 03:25 PM

You forgot to mention that Google regularly screws website owners out of money honestly earned through thier Google Adsense program.

My site, http://www.SolePM.com, started making lots of Adsense money when we finally figured out how to do effective marketing for the site.

Our traffic spiked, and in turn so did our Adsense profits.

Soon after we were turned off and money being held for that month...nearly $500 was not given to us.

No reason.

No explaination.

No communication.

Screw Google, Adsense, and Adwords.

Posted by: Gary Tharaldson at May 15, 2006 03:26 PM

In regards to the search for [site:ety.com], Google US may show that it can return 9,000 search listings, but, in fact, it will only show 903 listings before giving the omitted-due-to-similar-results message.

Posted by: Nick at May 15, 2006 03:35 PM

So, on the basis of your own problems with google, you think that it regularly screws people out of ad money? I expect hundreds of millions of sites use google ads, this is a very isolated problem that just happened to hit you.

Posted by: Russell at May 15, 2006 03:42 PM

This is all new to me :)

Posted by: Pedro at May 15, 2006 03:52 PM

TrackBack

[url] http://www.keithdsouza.com/google-news/google/ten-things-you-should-know-about-google.html [/url]

Posted by: Keith Dsouza at May 15, 2006 03:52 PM

we encountered the same response when our site http://www.musicscene.org started legally getting lots of traffic as well. Google turned off our adsense account seized our earned funds and refuses to respond. from many others i have talked with this seems to be standard operating procedure for them

Posted by: chrishawn at May 15, 2006 03:52 PM

Adsense payment problems aren't isolated to his issue. Many, many people have reported getting stiffed and with no recourse to recover their earned money nor explination of why it occured. Check the internet. Google it!

Feeling clever?

Posted by: Galactic Dominator at May 15, 2006 03:59 PM

You incorrectly quoted Voltaire above. It was not Voltaire who said "I may not agree with what you say, but I'll defend to the death your right to say it." That was Denis Diderot, and this is often miscredited.

Posted by: Brandon at May 15, 2006 03:59 PM

#11 Did you know google is the next microsoft?

Posted by: j$ at May 15, 2006 04:15 PM

Thanks for the tip off about the 10 word limit being raised to 32, I hadn't noticed that. That used to drive me nuts when trying to do more complex queries.

Posted by: Quartz Mountain at May 15, 2006 04:19 PM

The pagerank of a page is a probabilistic value, meaning that the sum of pageranks on all pages will always be equal to 1. With this in mind, it's not possible for a page to have a pagerank of 8.355. 0.8355 would be valid, maybe, if the page in question was the most authoritative and popular on the internet, but never 8.355.

Posted by: simon at May 15, 2006 04:19 PM

Completelly unrelated but I had to mention. The phrase "I may disagree with what you say, but I will defend to the death your right to say it" attributed to Voltaire was not created by him. See http://www.classroomtools.com/voltaire.htm

Posted by: Bruno G. Albuquerque at May 15, 2006 04:27 PM

About the number of computers Google has: An anonymous source told me that it is now over 1 million. Not 100,000 as was previously believed.

Posted by: x at May 15, 2006 04:41 PM

1 million thats bullshit u lie

Posted by: x2 at May 15, 2006 05:04 PM

> That was Denis Diderot, and this
> is often miscredited.

Now I know, thanks :)

Posted by: Philipp Lenssen at May 15, 2006 05:21 PM

For more things you might not know about Google, visit www.googleguide.com/example_ref.html

Posted by: Nancy Blachman at May 15, 2006 05:47 PM

"That was Denis Diderot, and this is often miscredited."

It's not Denis Diderot, it's a little-known writer, Evelyn Beatrice Hall.

http://en.wikiquote.org/wiki/Evelyn_Beatrice_Hall

Posted by: Shii at May 15, 2006 06:38 PM

Congrats, Seth, you're on Reddit.

Posted by: David at May 15, 2006 06:57 PM

excellent!!

Posted by: s0ng0 at May 15, 2006 08:22 PM

Lots of people have been screwed out of their money by google.

Check out this guy, for example:
http://www.newdelhitimes.org/archives/2006/05/testing_people.html

Posted by: Jonathan Boutelle at May 15, 2006 11:22 PM

Oops sorry, wrong link.
Here it is.
http://www.newdelhitimes.org/archives/2006/05/do_no_evil_they.html

Posted by: Jonathan Boutelle at May 15, 2006 11:24 PM

Not only does Google NOT pay you when you generate real genuine traffic, also they cancel your account with no way to return to AdSense.
And yes, no explanations given.

Posted by: Joop at May 16, 2006 04:48 AM

I didn't know that google allows more than 10 words now! Thats good news. I used to be very annoyed at the previous limit.

Posted by: hacker not cracker at May 16, 2006 01:06 PM

If you don't like Google AdAsense, don't put it on your site!! Explanation is click fraud. If it was you paying for the click, you would want to know they are genuine.

Posted by: Luke Gunderson at May 16, 2006 08:16 PM

Thanks a lot for the info.

Posted by: Raman at May 16, 2006 10:02 PM

Interesting post, thanks! As a web designer, it pays to keep an eye on what's going on at Google....these tidbits make it fun!

Posted by: Arvana at May 17, 2006 11:50 AM

google sucks

Posted by: jv at May 17, 2006 01:21 PM

I understand that Google's- as well as other company's programs- might well be concerned about fraud. After all, we all know it is rampant, we see it everywhere. However, I would suggest that what Google should in that case is initiate a transparent fraud investigation program, perhaps in cooperation with law enforcement. They should give people their money and then investigate and perhaps sue (or criminally charge)offenders if that is indicated. Either that, or they should lobby for the creation of laws allowing them to withthhold funds in suspect cases, pending the outcome of a thorough, transparent, and time limited investigation. Time limitation is important, to prevent this provision from being abused. That may be a legal way to do it, but I'm still not sure it's ethical in my book. What Google is alleged to have done is certainly not ethical or legal.
If this is happening, and enough noise is made about it and light cast upon it, any company so engaged should feel pressure to change. That presssure would come from the press (hopefully), the marketplace, and from the offices of attorneys general around the country. At least that's how it should work. With the corporatists in power in our government today all bets may be off. Please be sure to vote this fall.

Posted by: Feral at May 17, 2006 04:04 PM

Google is Google. There's really nothing you can do about it. Like it or not. But you can't sit here and complain about it when you still continue to use it. We know it's not perfect. If you don't like the way they do things don't continue to refer back to them. About Google not gicing people their money; it's not cool. Something should be done about it. Much love.

Posted by: Clarissa at May 19, 2006 08:27 PM

If you know it wasn't Voltaire, can I ask why you've not corrected the article?

Posted by: Meh at May 20, 2006 02:46 PM

And there are More Google Search Tools that you did not know about.

Posted by: rikki at May 21, 2006 06:40 AM

if you type in [site:ety.com] into google.co.uk you don't get anything - and no disclaimer.

Posted by: mememe at May 21, 2006 09:46 AM

well this is gr8! facts about google...

Posted by: hmm at May 22, 2006 12:44 PM

Many people don't realise that the order of the search words is important.

Posted by: Kirit at May 24, 2006 04:53 AM

The Google Beta one was fasinating. I guess everything they do is some type of evolution even themself!

Posted by: Wes at May 31, 2006 05:24 PM