nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kevin <kevin...@gmail.com>
Subject Re: HELP: Why crawled files so small? nutch version 0.8.1
Date Sat, 28 Oct 2006 03:40:22 GMT
Hi, Dennis,

Yes, I did.
the nutch-site.xml:


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>http.robots.agents</name>
  <value>www.linuxmine.com</value>
</property>
<property>
  <name>http.agent.name</name>
  <value>www.linuxmine.com</value>
</property>
<property>
  <name>http.agent.url</name>
  <value>www.linuxmine.com</value>
</property>

</configuration>



2006/10/11, Dennis Kubes <nutch-dev@dragonflymc.com>:
>
> Did you set the user agent name in the nutch-site.xml file?
>
> Dennis
>
> kevin wrote:
> > Why crawl file so small?
> > Total size: 12.4 KB
> >
> > I used this command:
> > ./nutch crawl urls -dir crawled -depth 20
> >
> > However,the website I crawled is not so small.
> >
> >
> >
> >
> > Regards!
>



-- 
kevin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message