lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com.au>
Subject Re: Duplicates recods in index
Date Wed, 08 Feb 2006 23:50:55 GMT
Pasha Bizhan wrote:
> Hi, 
> 
>> From: Anton Potehin [mailto:anton@orbita1.ru] 
>> 1) create Document object 
>>
>> 2) add 5 fields into Document (id, name, field1, field2, 
>> field3). All fields are stored, indexed and tokenized 
>>
>> 3) check if the document with current id and name was added before 
> 
> Just perform the search with given id and name values. 
> String query = "+id:(" + doc.get("id") + ") +name:(" + doc.get("name") +
> ")";

I don't know how this will be for efficiency.  If you did it that way, 
you would have to re-open the index for every single document you add, 
otherwise you might miss a duplicate which was added recently.

Really it depends when you need to know that it's a duplicate.  If you 
don't need to know right away, then you might as well use a query.  If 
you need to know right away, then you're better off keeping some other 
store of which IDs and names have been added, like an SQL database.

Daniel


-- 
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message