lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: Document Duplication for Multiple Segment Merge
Date Fri, 14 Oct 2005 17:58:57 GMT
Sorry, I've only briefly looked at Nutch, so you should ask on that mailing
list.
Lucene doesn't do deduping.


-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/14/05, Michael Ji <fji_00@yahoo.com> wrote:
>
> hi Yonik:
>
> Does that mean when two documents has same MD5 content
> in two different segments, IndexMerger.java will keep
> both of them?
>
> When I look at the code of IndexSegment.java, it
> handle MD5 dedupling by keeping the one with higher
> document ID.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message