| Partners | FAQ | About Archive-It | Press Room | Contact Us | Partner Login |
Along with Archive-It, Internet Archive offers a variety of crawling and archiving services, including large domain, topic, and event driven collections. Partners include Library of Congress, US National Archives and the National Library of Australia. Please contact us for more information on these services.
Why would I subscribe to Archive-It instead of using the Wayback machine at Internet Archive?
How frequently can I archive Web sites?
Who gets access to the collections created in Archive-It?
How can I search the collections?
What types of institutions can subscribe to Archive-It?
Archive-It is a subscription service that allows institutions to build and preserve collections of born digital content. Through the user-friendly web application, Archive-It partners can harvest, catalog, manage, and browse their archived collections. Collections are hosted at the Internet Archive data center and are accessible to the public with full-text search.
Subscribers to this service can create distinct Web archives called "collections", containing only the born digital content they are interested in harvesting, at whatever frequency suits their needs. All collections are full-text searchable. The collections created with Archive-It can be catalogued and managed directly by the subscriber. We keep a minimum of two copies of each collection online. None of these features are currently available in the General Archive at www.archive.org.
Archive-It is very flexible: you can harvest material from the Web using nine (9) different frequencies, from daily to annual. Subscribers can select different crawl frequencies for each chosen URL. Additionally, your institution can also chose to start a crawl "on demand" in the case of an unforseen spontaneous or historic event.
By default, all collections are available for public access from the main page at www.archive-it.org. However, a subscriber can choose to have their collection(s) made private by special arrangement.
Archive-It provides full text search capability for all public collections. You can also browse by URL from the list provided for each collection. The public can browse and search collections by partner type or collection from www.archive-it.org.
Archive-It is designed to fit the needs of many types of organizations and individuals. The 95+ partners include: state archives, university libraries, federal institutions, state libraries, non government non profits, museums, historians, and independent researchers.
Subscribers develop their own collections and have complete control over which content to archive within those collections.
All data created using the Archive-It service is hosted and stored by the Internet Archive. We store two copies online and are working with partners to have redundant copies in other locations at the Bibliotheca Alexandrina in Egypt and other locations in the U.S. Subscribers can also request a copy of their data for local use and preservation either on a hard drive or over the internet.
The Internet Archive is a 501(c)(3) non-profit that was founded in 1996 to build an 'Internet library,' with the purpose of offering permanent access for researchers, historians, and scholars to collections that exist in digital format.
Alexa Internet has been crawling the web since 1996, which has resulted in a massive archive. If you have a web site, and you would like to ensure that it is saved for posterity in the Internet Archive, and you've searched wayback and found no results, you can visit the Alexa's "Webmasters" page: http://pages.alexa.com/help/webmasters/index.html#crawl_site
Method 2: if you have the Alexa tool bar installed, just visit a site.
Method 3: while visiting a site, use the 'show related links' in Internet Explorer, which uses the Alexa service.
Sites are usually crawled within 24 hours and no more then 48. Right now there is a 6-12 month lag between the date a site is crawled and the date it appears in the Wayback Machine.
The Internet Archive Wayback Machine contains over 115 billion pages and over 3 petabytes of data. The collection is currently growing at a rate of 20 terabytes per month. This eclipses the amount of text contained in the world's largest libraries, including the Library of Congress. If you tried to place the entire contents of the archive onto floppy disks (we don't recommend this!) and laid them end to end, it would stretch from New York, past Los Angeles, and halfway to Hawaii.
All questions about the Wayback Machine, or other Internet Archive projects, should be addressed to info at archive dot org. You can contact the Archive-It team by emailing archive-it at archive.org.
The Internet Archive Wayback Machine is a service that allows people to visit archived versions of Web sites. Visitors to the Wayback Machine can type in a URL, select a date range, and then begin surfing on an archived version of the Web. Imagine surfing circa 1999 and looking at all the Y2K hype, or revisiting an older version of your favorite Web site. The Internet Archive Wayback Machine can make all of this possible.