comment 0

The Internet Archive.

I’ve heard about the Internet Archive before. But now I went to visit it for a while. The internet archive builds an internet library (founded in 1996!) and you can go and visit any site to see it’s historic life.

For example, go and have a look how well-known sites looked like before, or in the beginning. E.g. Google in 1998, hosted at Stanford University.

Google in 1998

Google in 1998

Or how I found out how the previous website and logo of the company for which I work looked like.

And, yes, using robots.txt you can exclude a site of being archived.

To exclude the Internet Archive’s crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

Useful or not to archive the whole internet…?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s