<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information Retrieval Blog &#187; crawler</title>
	<atom:link href="http://blog.zye.me/tag/crawler/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.zye.me</link>
	<description>REAL TIME DATA PROCESSING, DISTRIBUTED COMPUTING, PATTERN DISCOVERY</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:33:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Download Whole Website or Directories by using wget in Linux</title>
		<link>http://blog.zye.me/2009/09/54411.html</link>
		<comments>http://blog.zye.me/2009/09/54411.html#comments</comments>
		<pubDate>Sat, 19 Sep 2009 18:43:41 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[crawler]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[wget]]></category>

		<guid isPermaLink="false">http://blog.so8848.com/?p=54411</guid>
		<description><![CDATA[Download Whole Website or Directories by using wget in Linux You might have googled a software for downloading a specified website or directory on either Windows or Linux platform . Yes, a bunch of tools can do this for you. Actually, we can do this by using a simple command, wget, on Linux platform. It <a href='http://blog.zye.me/2009/09/54411.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<h1>Download Whole Website or Directories by using wget in Linux</h1>
<p>You might have googled a software for downloading a specified website or directory on either Windows or Linux platform . Yes, a bunch of tools can do this for you. Actually, we can do this by using a simple command, wget, on Linux platform. It is highly customizable, just a powerful crawler. You will find it fantastic and really cool. Let me just show you how!</p>
<p><strong>wget </strong></p>
<p><strong>&#8211;recursive </strong></p>
<p><strong>&#8211;no-clobber </strong></p>
<p><strong>&#8211;page-requisites </strong></p>
<p><strong>&#8211;html-extension </strong></p>
<p><strong>&#8211;convert-links </strong></p>
<p><strong>&#8211;restrict-file-names=windows </strong></p>
<p><strong>&#8211;domains techstroke.com </strong></p>
<p><strong>&#8211;no-parent </strong></p>
<p><strong>www.techstroke.com/Windows/</strong></p>
<p>The command above let you download the &#8220;windows&#8221; directory at the domain of &#8220;<strong>techstroke.com&#8221; </strong>recursively, starting from the url  <strong>www.techstroke.com/Windows/</strong></p>
<p>How do you like it? Hah, really cool?</p>
<p>Finally, let me explain a bit more about the parameters. Of course, you can refer to its documentation.</p>
<p><strong><span style="font-family: mceinline;"><span style="font-family: mceinline;">The options are:</span></span></strong></p>
<p><strong>–recursive: </strong>download the entire Web site.</p>
<p>–domains-techstroke.com: don’t follow links outside techstroke.com.</p>
<p>–no-parent: don’t follow links outside the directory /Windows/.</p>
<p>–page-requisites: get all the elements that compose the page (images, CSS and so on).</p>
<p>–html-extension: save files with the .html extension.</p>
<p>–convert-links: convert links so that they work locally, off-line.</p>
<p>–restrict-file-names=windows: modify filenames so that they will work in Windows as well.</p>
<p>–no-clobber: don’t overwrite any existing files (used in case the download is interrupted and</p>
<p>resumed).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2009/09/54411.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

