An anticensorware investigation by Seth Finkelstein

Abstract: This report examines how various censorware programs blacklist an extensive (100 Terabytes) web-site archive called the "Wayback Machine" . The control requirements of censorware lead to considering this archive site as a "Loophole" or "proxy avoidance systems". The censorware slippery-slope logic (or flying leap off a sharp cliff) leading to suppressing such a digital library is discussed.


DISCLAIMER REGARDING CONTENT.--Nothing in this title or the amendments made by this title shall be construed to prohibit ... [schools or libraries] ... from blocking access ... to any content other than content covered by this title or the amendments made by this title.

-- The (misleadingly named, as it applies to all ages) Children's Internet Protection Act (CIPA)

The Wayback Machine is "a digital library of Internet sites", a web archive collection . It describes itself modestly as a site which "makes it possible to surf pages stored in the Internet Archive's web archive". This simple statement does not do it justice. The archive contains a huge historical collection of web pages, from vast numbers of websites. Not mere megabytes, not just gigabytes, but (as of this writing) 100 Terabytes . It spans years of past web history, with enormous records of various websites as they evolved. This archived material is not only text, but extends to many images. And that comprehensiveness, that tremendous accumulation of knowledge, makes it a deadly LOOPHOLE in terms of censorware.

Simply put, consider the site as performing a function similar to the Google cache , but extending in web-time as well as web-space. Note, all web pages come from the archive site itself. So a censorware program has a deep problem in determining whether the pages are to be prohibited or permitted, since all material originates from the site of the digital library. This forces an all-or-nothing dilemma. If this archive were not banned, completely, it might be a possible way to evade censorware (by looking up forbidden material).

Rage Against The (Wayback) Machine

Categories are tools. Like other tools, they are made to be used for particular purposes. When they are separated from those purposes, or turned to purposes for which they were not intended, they often are less useful and sometimes may be misunderstood.

-- The Websense Master Database: Categories

Filtering of internet sites has been implemented in response to the need to meet federal CIPA requirements and to address in-district concerns to limit some internet access. ...

Translation Sites/Proxy Avoidance: These sites allow internet users to beat the filtering system.

-- part of a school district censorware explanation

Given the above problem, it's no surprise that the Wayback Machine is not treated well by censorware.

N2H2 's BESS has blacklisted Wayback Machine as a LOOPHOLE site. This is their (almost entirely undocumented) term for sites which allow escape from the necessary control of censorware. This blacklist cannot be disabled in N2H2's system.

Websense does something similar to N2H2. The category which N2H2 calls a LOOPHOLE , is somewhat analogous to what is termed by Websense Proxy Avoidance Systems . That is:

13.2 Proxy Avoidance Systems. Sites that provide information on how to bypass proxy server features or to gain access to URLs in any way that bypasses the proxy server.

It should be noted that Websense puts language-translators in a separate blacklist, and at least these are all documented. In theory, these blacklists might go unused. But in practice, what authority is going to permit ways "to beat the filtering system"?

So Websense blacklists Wayback Machine as "Proxy Avoidance Systems" . Observe, given the description, one might think that such sites are somehow disreputable or have little value besides getting around the censorware. Few people would consider that an extensive historical web archives qualifies. And that access is denied to a digital library by the imperatives of control, the necessities of the blinder-box.

SmartFilter has two different approaches. An older version (2.0), straightforwardly blacklists Wayback Machine in every category (for an explanation of why this is done, see the report SmartFilter's Greatest Evils ). A newer version (3.01) is somewhat bizarre. At different times, it's been observed to blacklist Wayback Machine as "Anonymizer" versus an odd outcome to blacklist Wayback Machine as "Entertainment" . I believe this happens because the site is not the same as . The site is the general organization website, while the site is the means of reading the digital library itself. SmartFilter 3.01 tends to be extremely stupid about distinguishing between different sub-domains of a website, by default blacklisting everything under a domain in the same way. That is, <anything> defaults to the same blacklisting . So between and, whatever entry was last entered for one, was apparently inherited by the other. I assume SmartFilter will stop bouncing back and forth for these particular sites once this has been publicized (and people use to escape). But remember this crudeness and broad brush the next time a censorware maker touts its "list accuracy" .

Reading the fine print, SmartFilter defines "Anonymizer" as

Anonymizers/Translators (an)

Anonymizers enable anonymous Web browsing through an intermediary to prevent unauthorized parties from gathering personal information. However, anonymizers also allow users access to ANY Web page and bypass blocking software. Language translators that provide input of whole URLs for translation also act like anonymizers. Language translators that translate only TEXT are not blocked.

In a way, this is the most honest declaration yet, of what must be banned. It's an insight into the censorware mentality. And of course, historical archives presumably "act like anonymizers" in this way of thinking. So they must be banned too.

The Pre-Slipped Slope

Slip sliding away, slip sliding away
You know the nearer your destination
The more you slip sliding away

-- "Slip Sliding Away" , by Paul Simon

Almost all of the discussion of censorware takes place in what can be termed the "toxic material" model. That is, the censorware program is thought of as a "filter", something which acts to purify the unsafe Internet of dangerous, harmful, toxic material. One major effect of this idea is to focus any debate on the alleged toxicity of the prohibited material, taking for granted that the censorware program is in fact devoted to Internet filtering.

But this approach is not an accurate conception of censorware. Censorware is about control, not filtering. The goal of censorware is to construct an escape-proof blinder-box for what a person is allowed to read. There are profound implications in the technical requirements of maintaining such control over permitted reading material. These implications simply don't seem to penetrate the debate.

Censorware is an extensive demonstration of the "slippery slope" in suppressing speech. Note the slippage far worse than might be obvious. A classical slippery slope is in subtly expanding the material which is subject to suppression. That is, we might start off with a supposed requirement of banning "obscene" material, a very narrow range of content considered unprotected by the First Amendment. Then - slip, slip, slip - suddenly material is banned not according to a complex legal standard, after a formal judicial hearing, but in prior restraint, merely as to whether it "meets the criteria" of a censorware company's vague definition as applied by a keyword-matching program .

However, the control requirements of censorware are an order of magnitude more expansive. If the above extension in banning is a slippery slope, we now discuss taking flying leap off a sharp cliff. In the prohibition of the Wayback Machine we have 100 Terabytes of archival material being denied, because it's viewed as a possible way "to beat the filtering system". We've started out with supposedly worthless, toxic material, and ended up banning whole digital libraries.


There seems to be little interesting to say about filtering software and ratings anymore -- it's an old and a tired, debate. ...

-- Declan McCullagh, journalist, Oct 30 2001, editorial comment (possibly not completely objective on this point )

When censorware bans the Wayback Machine , it is not a program accident or human error. It will not be fixed-when-exposed, or fixed-in-next-release, or fixed-when-AI-is-developed-someday. It is a logical outcome of the imperatives of control over all reading material. This "pre-slipped slope" results in the deliberate electronic book-burning of a unique, unparalleled, digital library.

Version 1.0, Mar 13 2002

Update 1.1, March 23 2002 - Maybe they're reading. SmartFilter is now once again blacklisting Wayback Machine as "Anonymizer"

