lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jacky" <jackych...@gmail.com>
Subject Re: duplicate fields
Date Fri, 08 Sep 2006 07:38:46 GMT
hi Daniel,
   How do you use a separate database to check the duplicate fields?  It is interesting!
 
     Best Regards.
       jacky  
       
----- Original Message ----- 
From: "Daniel Noll" <daniel@nuix.com.au>
To: <java-user@lucene.apache.org>
Sent: Friday, September 08, 2006 3:08 PM
Subject: Re: duplicate fields


> jacky wrote:
> > hi, 1. Is there an effect method to check if there exists the same 
> > field(hold a unique ID) when added into lucene index database? Make a
> > search for this field?
> 
> One way is to create an IndexReader and IndexSearcher on your index, 
> which you reopen every now and then.  But we do this task by using a 
> separate database, for the sake of efficiency.
> 
> > 2. Is there an effect method to check if there exists the duplicate
> > fields(hold a unique ID) in the lucene index database? Two methods:
> > Read all documents and compare the fields, or search for each field.
> > Is there a better one?
> 
> The simplest way without using an external database is to use the 
> termDocs enumeration.  For each term you can easily see which ones have 
> multiple documents, so every document other than the first for each term 
> is a duplicate (which you could then use to build a filter to remove 
> duplicates.)
> 
> Daniel
> 
> 
> 
> -- 
> Daniel Noll
> 
> Nuix Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
> Web: http://www.nuix.com.au/                        Fax: +61 2 9212 6902
> 
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
Mime
View raw message