lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rod Giles" <Rod.Gi...@ventyx.com>
Subject FW: Eliminating duplicate documents when indexing
Date Thu, 04 Oct 2007 00:29:28 GMT
Duplicate Documents In An Index

The updateDocument method of Index Writer indicates that a delete term
occurs before the update

document takes place (i.e. the document is replaced in the index, but
not duplicated).    Has anyone

been able to get this process to work?  The term that I am using has a
unique key that comes directly

from the primary key of a database table.   But, my updated documents
are still consistently duplicated

in my index when the writer is eventually flushed.   I am tried using
Lucene 2.2 and nightly build

Lucene-2007-10-02_02-29-37 (which correctly includes the
setRamBufferSizeMB() method for Index Writer).

 

 

Optimizing An Index

Also, in the nightly build lucene-2007-10-02_02-29-37, the optimize()
method of Index Writer appears to

have been broken.  My existing code, which worked with Lucene 2.2,
consistently throws an illegal argument

exception when this method is executed.

 

 

 



DISCLAIMER:
Please note that our email and web site addresses have changed.
************************************************************************
This email message and all attachments transmitted with it are for the sole use of the intended
recipient(s) and may contain confidential and privileged information. Please DO NOT forward
this email outside of the recipient's Company unless expressly authorized to do so herein.
Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and destroy all copies of the
original message.
Any views expressed in this email message are those of the individual sender except where
the sender specifically states them to be the views of Ventyx.
************************************************************************

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message