October 27, 2003

Whitehouse.gov iraq robots.txt directories - an explanation?!

Update 10/28: The White House says it's merely a design issue, from

http://www.2600.com/news/view/article/1803

Per: http://www.bway.net/~keith/whrobots/whresp.html


[(10/27) Just sent this to Dave Farber's list, about the whitehouse iraq robots.txt directories (update: note for more background, see http://www.bway.net/~keith/whrobots/ )]

Archived at

The White House And Iraq Directories
http://sethf.com/domains/whitehouse-iraq/

Dave, I've been analyzing the robots.txt file, exactly because the directories are so strange. I have a theory on what's happened. But it's so jaw-dropping that I'm hesitant to rush it into a formal report/release. In short:

There's no conspiracy.

There's a real-life instance of the joke genre which runs "I thought you said ..."

For example, here's one of the jokes: "After a California earthquake, Dan Quayle is sent to visit the most damaged site. But he never arrives there. Finally, he's found in Florida. He says, shocked, "Go to the EPIcenter? I thought you said ..." [EPCOT Center]

The joke here? Someone said:

"Don't have the search engines looking at the Iraq documents index"

And that was heard as:

"Don't have the search engines looking at every "index" with Iraq"

Really!

The evidence for this is that the robots.txt file has lines for

Disallow: /disk2/www/htdocs/infocus/iraq
Disallow: /disk2/www/htdocs/infocus/iraq/news/infocus/iraq

These are the only lines where there's never any matching pattern of "iraq" and "text" at all. They're obviously special in some way. And they look like they're a searchable index.

Then there's the fact that some people are confused between directories, the function of the file "index.html", and that a bare directory will display as "Index of <directory name>" in some servers.

So ... "Iraq index" ... "Index of <directory name>" ... Oooops!

Never attribute to malice which can be explained by stupidity.

This is hard to believe. But it fits!


Update - the robots.txt file has been changed. Grab it from

http://sethf.com/domains/whitehouse-iraq/wh-robots.txt

Or while it lasts, the Google cache:

http://216.239.41.104/search?q=cache:tCfemw3M-aUJ:www.whitehouse.gov/robots.txt

By Seth Finkelstein | posted in infothought | on October 27, 2003 09:46 PM (Infothought permalink) | Followups
Seth Finkelstein's Infothought blog (Wikipedia, Google, censorware, and an inside view of net-politics) - Syndicate site (subscribe, RSS)

Subscribe with Bloglines      Subscribe in NewsGator Online  Google Reader or Homepage

Comments

Seth, I don't think you're right. You're looking at the poor execution of the directive, and taking that to mean there was different reasoning behind the decision.

I think they did exactly what you suggested they didn't, which is to hide every directory about Iraq. Instead of simply finding those directories and adding them, however (probably in addition to that, actually), they also played it safe (read: played it stupid) and made sure they missed no directories by replacing the "text" in every text-only directory listed with "iraq."

But the idea is the same.

Posted by: Jesse Berney at October 28, 2003 08:49 AM

Jesse, I'm thinking this way: WHY would someone make all those entries? What possible sane reason could they have to do it? What's the reasoning here?

I know this is a dangerous style of thinking to use in politics, but it's how I think.

I believe the "Never assume malice if it can be explained by stupidity" rule applies strongly to anything involving issues with computers.

I know, in politics, if your opponent sneezes, the standard thing to do is to accuse them of engaging in biological warfare. We saw that happen with the "Al Gore claimed he invented the Internet" smear.

But what evidence is there for malice rather than stupidity? The stupidity explanation fits better, because it explains the replacement of all the "text" entries with "iraq" corresponding - because the person is confused about what's an "index", given the presence of the database index (htdocs) entries.

Whereas the malice explanation require first that we assume they are doing a strange thing, to try to create a "memory hole", then that they're using something which won't work, then they make many, many, weird entries just to play it safe - to me, this collapses under its own weight. It's too contorted a theory to make practical sense of the evidence, when there's a much simpler explanation which fits much better.

Posted by: Seth Finkelstein at October 28, 2003 09:29 AM

And see also this comment in Dan Gillmore's eJournal discussion:

At the Internet Archive, we just recently (before this speculation erupted) got word from the White House webmaster that they wanted us to do an extensive crawl of their site. See my my blog entry for more details:

http://gojomo.blogspot.com/#106732065514107786

Their robots.txt is weird and suboptimal, no doubt, but given that I just saw them express a genuine desire to be crawled and archived a few days ago, the weirdness should have a completely innocuous explanation.

Posted by: Gordon Mohr on October 27, 2003 10:16 PM


Posted by: Seth Finkelstein at October 28, 2003 09:59 AM

I don't buy the "innoculous" idea, but I do think that it's, at best, very curious that they're trying to hide something. I don't think it's a conspiracy theory to point out when the government actually IS trying to hide information! :)

Here's the URL to my explanation: http://shock-awe.info/archive/000965.php

--Kynn

Posted by: Kynn Bartlett at October 28, 2003 03:23 PM