nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gal Nitzan" <gnit...@usa.net>
Subject Re: RSS-fecter and index individul-how can i realize this function
Date Tue, 06 Feb 2007 09:42:54 GMT
Hi,

IMO it should stay the same.

URL as the key and in the filter each item link element becomes the key.

I will be happy to convert the current parse-rss filter to the suggested
implementation.

Gal.

------ Original Message ------
Received: Tue, 06 Feb 2007 10:36:03 AM IST
From: Doğacan Güney <dogacan.guney@agmlab.com>
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function

> Hi,
> 
> Doug Cutting wrote:
> > Doğacan Güney wrote:
> >> I think it would make much more sense to change parse plugins to take
> >> content and return Parse[] instead of Parse.
> >
> > You're right.  That does make more sense.
> 
> OK, then should I go forward with this and implement something?   This
> should be pretty easy,
> though I am not sure what to give as keys to a Parse[].
> 
> I mean, when getParse returned a single Parse, ParseSegment output them
> as <url, Parse>. But, if getParse
> returns an array, what will be the key for each element?
> 
> Something like <url#i, Parse[i]> may work, but this may cause problems
> in dedup(for example,
> assume we fetched the same rss feed twice, and indexed them in different
> indexes. Two version's url#0 may be
> different items but since they have the same key, dedup will delete the
> older).
> 
> --
> Doğacan Güney
> 
> >
> > Doug
> >
> >
> >
> 
> 




Mime
View raw message