lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Ji <fji...@yahoo.com>
Subject Re: Document Duplication for Multiple Segment Merge
Date Fri, 14 Oct 2005 18:10:16 GMT
Sorry, I guess I point out a wrong java class name.

I want to be confirmed that if SegmentMerger.java in
Lucene do dedup or not. I tracing down couple of java
class from SegmentMerger.java, such as,
SegmentReader.java, IndexWriter.java,
SegmentReader.java. I didn't see a dedup mechanism
yet.

thanks,

Micheal Ji,

--- Yonik Seeley <yseeley@gmail.com> wrote:

> Sorry, I've only briefly looked at Nutch, so you
> should ask on that mailing
> list.
> Lucene doesn't do deduping.
> 
> 
> -Yonik
> Now hiring -- http://tinyurl.com/7m67g
> 
> On 10/14/05, Michael Ji <fji_00@yahoo.com> wrote:
> >
> > hi Yonik:
> >
> > Does that mean when two documents has same MD5
> content
> > in two different segments, IndexMerger.java will
> keep
> > both of them?
> >
> > When I look at the code of IndexSegment.java, it
> > handle MD5 dedupling by keeping the one with
> higher
> > document ID.
> >
> 



	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message