Sunday, April 6, 2008

Download complete websites for offline browsing with HTTrack

>
HTTrack is a free open source website ripper that allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable by options and by filters (include/exclude), and has an integrated help system. HTTrack uses a web crawler to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic JavaScript and inside Applets or Flash.


httrack


What not to do with HTTrack.

Many Webmasters are concerned about bandwidth abuse. You must understand that webmasters have to pay for the bandwidth usage of their website. In other words, the webmasters have to pay from your browsing bandwidth. Offline browsers tools, like HTTrack, can therefore be used in a wrong way. Webmasters don't like their bandwidth to be abused by their visitors. Hence please remember these rules to avoid any network abuse.

Do not overload the websites. Downloading a site can overload it, if you have a fast pipe, or if you capture too many simultaneous cgi (dynamically generated pages).
  • Do not download too large websites: use filters
  • Do not use too many simultaneous connections
  • Use bandwidth limits
  • Use connection limits
  • Use size limits
  • Use time limits
  • Only disable robots.txt rules with great care
  • Try not to download during working hours
  • Check your mirror transfer rate/size
  • For large mirrors, first ask the webmaster of the site

Ensure that you can copy the website
  • Are the pages copyrighted?
  • Can you copy them only for private purpose?
  • Do not make online mirrors unless you are authorized to do so

Do not overload your network
  • Is your (corporate, private..) network connected through dialup ISP?
  • Is your network bandwidth limited (and expensive)?
  • Are you slowing down the traffic?

Do not steal private information
  • Do not grab emails
  • Do not grab private information

Source

0 comments:

Post a Comment

Popular Posts