Skip to main content

Error 404? Retrieve Deleted Web Pages With These Browser Extensions

What do you do when you arrive at a webpage that says Error 404 – Not Found? Do you click your tongue and close the page? Wait, not so fast. There is a good possibility that the deleted page is cached somewhere on the web. You just have to know where to look.

lightpost-404
Error 404 page of Lightpost Creative

The best place to start is Google, or for that matter, any search engine. When search engines crawl websites to index pages, they save a copy of the page locally on their server (and that’s how Google can retrieve results faster than you can blink your eye, because all they really do is search for the local copy and not the actual internet). The cached copy is accessible from the search results page. Previously, the link to the cached copy was in plain sight. Now they have moved it under instant preview. In any case, it’s there and clicking on the link will retrieve the copy of the page from Google’s servers even if the original page is deleted by the website owner. This works not only for Google, but Bing too.

google-cache

However, not all pages have a cached copy. Very recently published pages usually don’t have a cached copy. If the page was published, say within the last few hours or minutes, the cached page might not be available. Pages deleted a long time ago might not have a cache either. The Googlebot periodically crawls websites to re-index pages on a site and pages that are missing or deleted will eventually be removed from the cache as well.

For really old pages, the ideal place to look for is at the Internet Archive Wayback Machine.

wayback-machine

The Wayback Machine, according to the description found on Wikipedia, “is a digital time capsule created by the Internet Archive non-profit organization, based in San Francisco, California. It is maintained with content from Alexa Internet. The service enables users to see archived versions of web pages across time, which the Archive calls a three dimensional index.”

The Wayback Machine’s cache is not as vast as that of Google, but it indexes most moderate to large websites. One major drawback of the Wayback Machine is that snapshots may be delayed for as long as 6 months or more after they are archived, or in some cases, even later, 24 months or longer. However, things have started to look better after the site went a major redesign last year. Now snapshots are available within a few hours to a few days. The frequency of snapshots is but variable, so not all tracked web site updates are recorded. Sometimes intervals of several weeks or years occur.

Obviously, you want an easier access to these services, and that is provided by the Web Cache extension for Chrome.

webcache-chrome

Web Cache lets you quickly search for the missing page on a number of locations including the Internet Wayback Machine, Google Cache, Yahoo Cache, Bing Cache, CoralCDN, Gigablast and WebCite.

Among the supported services the only ones worth using are Google Cache, Yahoo Cache, Bing Cache and Wayback Machine. CoralCDN is a CDN service (content distribution network) and I don’t know how that helps. Gigablast is an obscure search engine that is no good and Webcite is a Wayback Machine type of service but with limited reach.

gcache

For Firefox, there is just one add-on called Gcache+ that lets you search on Google’s cache for the unavailable page. There is another one called Resurrect Pages supports the same set of web caches as the Chrome extension, but not for missing (error 404) pages, but rather for pages that couldn’t be reached due to network problem or offline server. Yet another one called ErrorZilla Mod suffers from the same limitation.

Nothing for Internet Explorer or Opera.

Comments

  1. Thanks for your research, and sharing it. love this blog!! it's like the type of lifehacker posts that I love the most, all put into one place! and unique content usually not found on lh either!! :).

    You have a new subscriber :).

    ReplyDelete
  2. What to do when none of these options work? Is there any way to view/retrieve a web page that has been removed from google cache & isn't on wayback machine, the internet archive?

    ReplyDelete

Post a Comment

Popular posts from this blog

How to Record CPU and Memory Usage Over Time in Windows?

Whenever the computer is lagging or some application is taking too long to respond, we usually fire up task manager and look under the Performance tab or under Processes to check on processor utilization or the amount of free memory available. The task manager is ideal for real-time analysis of CPU and memory utilization. It even displays a short history of CPU utilization in the form of a graph. You get a small time-window, about 30 seconds or so, depending on how large the viewing area is.

How to Schedule Changes to Your Facebook Page Cover Photo

Facebook’s current layout, the so called Timeline, features a prominent, large cover photo that some people are using in a lot of different creative ways. Timeline is also available for Facebook Pages that people can use to promote their website or business or event. Although you can change the cover photo as often as you like, it’s meant to be static – something which you design and leave it for at least a few weeks or months like a redesigned website. However, there are times when you may want to change the cover photo frequently and periodically to match event dates or some special promotion that you are running or plan to run. So, here is how you can do that.

Diagram 101: Different Types of Diagrams and When To Use Them

Diagrams are a great way to visualize information and convey meaning. The problem is that there’s too many different types of diagrams, so it can be hard to know which ones you should use in any given situation. To help you out, we’ve created this diagram that lays out the 7 most common types of diagrams and when they’re best used: