The Internet Archive.


I’ve heard about the Internet Archive before. But now I went to visit it for a while. The internet archive builds an internet library (founded in 1996!) and you can go and visit any site to see it’s historic life.

For example, go and have a look how well-known sites looked like before, or in the beginning. E.g. Google in 1998, hosted at Stanford University.

Google in 1998

Or how I found out how the previous website and logo of the company for which I work looked like.

And, yes, using robots.txt you can exclude a site of being archived.

To exclude the Internet Archive’s crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

Useful or not to archive the whole internet…?