Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B8C39101 for ; Fri, 16 Dec 2011 17:43:01 +0000 (UTC) Received: (qmail 48585 invoked by uid 500); 16 Dec 2011 17:42:59 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 48524 invoked by uid 500); 16 Dec 2011 17:42:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 48516 invoked by uid 99); 16 Dec 2011 17:42:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Dec 2011 17:42:59 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Dec 2011 17:42:50 +0000 Received: from VEGA (port-92-196-125-184.dynamic.qsc.de [92.196.125.184]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id 0D58914AA356; Fri, 16 Dec 2011 17:42:28 +0000 (UTC) From: "Uwe Schindler" To: , References: <4EEB77AD.1070102@fastmail.fm> In-Reply-To: <4EEB77AD.1070102@fastmail.fm> Subject: RE: Why is the old value still in the index Date: Fri, 16 Dec 2011 18:43:07 +0100 Message-ID: <00bc01ccbc1a$2d9fa2f0$88dee8d0$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-index: AQLsHqHtxs55zq33151pEab7v2gMUpOf9amg Content-language: de X-Virus-Checked: Checked by ClamAV on apache.org Hi, > I'm adding documents to an index, at a later date I modify a document and > update the index, close the writer and open a new IndexReader. My > indexreader iterates over terms for that field and docFreq() returns one as I > would expect, however the iterator returns both the old value of the document > and the new value, I don't expect (or want) the old value to still be in the index, > so why is this. That is all as expected. Updating documents in a Lucene index is an atomic delete/add operation. Deleting in Lucene just marks the document for deletion, but it is still there (search results won't return it). The condequence is that all terms are still in terms index and all document frequencies still contain both documents. This *may* cause scoring problems in indexes with many deletes (but those will go away as merging will remove them, see below), but this is known (see wiki, javadocs,...). Once you add more documents the index will merge segments and that will make the deleted documents disappear. If you really want to do remove the old documents with all terms (this is veeeeery expensive), you can call IW.forceMergeDeletes: http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/index/IndexWr iter.html#forceMergeDeletes() The way how inverted indexes work makes it impossible to update the terms index afterwards. Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org