nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xiao yang <yangxiao9...@gmail.com>
Subject Re: Generate of Segments
Date Tue, 02 Feb 2010 12:50:38 GMT
bin/nutch generate crawl/crawldb crawl/segments -topN 1000


On Mon, Feb 1, 2010 at 9:58 PM, Tom Landvoigt <tom.landvoigt@linklift.de> wrote:
> Hi,
>
>
>
> I am using Nutch-1.0 manly for crawling.
>
>
>
> I want to generate Segments with a fixed size eg. 1000 urls. But the
> Segment should only contain uncrawled urls and urls which have been
> waiting longest for recrawling.
>
>
>
> Can anyone give me a hint where I should tackle the problem?
>
>
>
> Thanks a lot
>
>
>
> Tom
>
>

Mime
View raw message