nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney <dogacan.gu...@agmlab.com>
Subject Re: RSS-fecter and index individul-how can i realize this function
Date Tue, 06 Feb 2007 08:35:24 GMT
Hi,

Doug Cutting wrote:
> Doğacan Güney wrote:
>> I think it would make much more sense to change parse plugins to take
>> content and return Parse[] instead of Parse.
>
> You're right.  That does make more sense.

OK, then should I go forward with this and implement something?   This
should be pretty easy,
though I am not sure what to give as keys to a Parse[].

I mean, when getParse returned a single Parse, ParseSegment output them
as <url, Parse>. But, if getParse
returns an array, what will be the key for each element?

Something like <url#i, Parse[i]> may work, but this may cause problems
in dedup(for example,
assume we fetched the same rss feed twice, and indexed them in different
indexes. Two version's url#0 may be
different items but since they have the same key, dedup will delete the
older).

--
Doğacan Güney

>
> Doug
>
>
>


Mime
View raw message