nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Reardon <irnu...@gmail.com>
Subject Re: How does this sound
Date Fri, 13 May 2005 18:44:01 GMT
I have crated individual url-filters to specify exactly what pages I
want in each site, and then I wrote a script to switch in and out the
different filters when I crawl.  That way i'm sure to never go off
site.

On 5/13/05, EM <emilijan@cpuedge.com> wrote:
> Sounds fine with me although more experience people here may have
> different opinion.
> 
> One small thing, if you are setting up each site individually, then,
> fully disable the spidering. That way, you can inject individual sites
> by yourself.
> 
> Good luck,
> Emilijan
> Ian Reardon wrote:
> 
> >I am going to crawl a small set of sites and I never want to go off
> >site and I also want to strictly control my link dept.
> >
> >I setup crawls for each site using the crawl command.  Then manually
> >move the segments folder to my "master" directory and re-index.  (This
> >can all be scripted).  This gives me the flex ability to QA each
> >individual crawl.
> >
> >Am I jumping through unnecessary hoops here or does this sound like a
> >reasonable plan?
> >
> >
>

Mime
View raw message