Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5EF3A9D41 for ; Fri, 16 Dec 2011 19:40:07 +0000 (UTC) Received: (qmail 21299 invoked by uid 500); 16 Dec 2011 19:40:05 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 21224 invoked by uid 500); 16 Dec 2011 19:40:04 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 23142 invoked by uid 99); 16 Dec 2011 17:32:45 -0000 X-ASF-Spam-Status: No, hits=-1.6 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Why is the old value still in the index Date: Fri, 16 Dec 2011 17:32:13 -0000 Message-ID: <8163028120305742991D2FB7F19412AB01A11E60@uksrpblkexb01.detica.com> In-Reply-To: <4EEB7DB1.8000109@fastmail.fm> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Why is the old value still in the index Thread-Index: Acy8FvwDCk3SwJaKQxqyV88UcyPT6gAADTFQ References: <4EEB77AD.1070102@fastmail.fm> <4EEB7DB1.8000109@fastmail.fm> From: "Austin, Carl" To: X-OriginalArrivalTime: 16 Dec 2011 17:32:14.0251 (UTC) FILETIME=[A651AFB0:01CCBC18] X-Virus-Checked: Checked by ClamAV on apache.org The .docFreq() call returns the number of documents that the current term in the enum is in, not all terms in the term enum. Also be aware of, from the lucene wiki : "Once a document is deleted it will not appear in TermDocs nor TermPositions enumerations, nor any search results. Attempts to load the document will result in an exception. The presence of this document may still be reflected in the docFreq statistics, and thus alter search scores, though this will be corrected eventually as segments containing deletions are merged." You can check more accurately by using the TermDocs if you need to. -----Original Message----- From: Paul Taylor [mailto:paul_t100@fastmail.fm]=20 Sent: 16 December 2011 17:20 To: Ian Lea Cc: java-user@lucene.apache.org Subject: Re: Why is the old value still in the index On 16/12/2011 17:10, Ian Lea wrote: > Shouldn't > > iw.updateDocument(new Term(FIELD1,"term1"),document); > > be > > iw.updateDocument(new Term(FIELD1,"test"),document); > > if you want to replace the first doc? Hmm, you are right if I change it I then get TermDocsFreq1 test TermDocsFreq1 test2 (but doesn't resolve the program with my real code that doesnt seem to=20 have this mistake :() What I dont understand then is in the incorrect example why don't I get TermDocsFreq2 if Ive actually create another document rather than updating one ? -- Ian. On Fri, Dec 16, 2011 at 4:54 PM, Paul Taylor=20 wrote: >> I'm adding documents to an index, at a later date I modify a document and >> update the index, close the writer and open a new IndexReader. My >> indexreader iterates over terms for that field and docFreq() returns one as >> I would expect, however the iterator returns both the old value of the >> document and the new value, I don't expect (or want) the old value to still >> be in the index, so why is this. >> >> >> This full test program generates: >> >> TermDocsFreq1 >> test >> TermDocsFreq1 >> test >> test2 >> >> Dont expect to see 'test' listed the second time >> >> >> package com.jthink.jaikoz; >> >> import org.apache.lucene.analysis.standard.StandardAnalyzer; >> import org.apache.lucene.document.Document; >> import org.apache.lucene.document.Field; >> import org.apache.lucene.index.*; >> import org.apache.lucene.store.RAMDirectory; >> import org.apache.lucene.util.Version; >> >> >> public class LuceneTest >> { >> public static void main(String []args) >> { >> try >> { >> String FIELD1=3D"field1"; >> RAMDirectory dir =3D new RAMDirectory(); >> IndexWriterConfig iwc =3D new IndexWriterConfig(Version.LUCENE_35, >> new StandardAnalyzer(Version.LUCENE_35)); >> IndexWriter iw =3D new IndexWriter(dir, iwc); >> Document document =3D new Document(); >> document.add(new Field(FIELD1,"test", Field.Store.YES, >> Field.Index.ANALYZED)); >> iw.addDocument(document); >> iw.close(); >> >> IndexReader ir =3D IndexReader.open(dir,true); >> TermEnum terms =3D ir.terms(new Term(FIELD1)); >> System.out.println("TermDocsFreq"+terms.docFreq()); >> do >> { >> if (terms.term() !=3D null) >> { >> System.out.println(terms.term().text()); >> } >> } >> while (terms.next()&& terms.term().field().equals(FIELD1)); >> >> IndexWriterConfig iwc2 =3D new IndexWriterConfig(Version.LUCENE_35, >> new StandardAnalyzer(Version.LUCENE_35)); >> iw =3D new IndexWriter(dir, iwc2); >> document =3D new Document(); >> document.add(new Field(FIELD1,"test2", Field.Store.YES, >> Field.Index.ANALYZED)); >> iw.updateDocument(new Term(FIELD1,"term1"),document); >> iw.close(); >> >> ir =3D IndexReader.open(dir,true); >> terms =3D ir.terms(new Term(FIELD1)); >> System.out.println("TermDocsFreq"+terms.docFreq()); >> do >> { >> if (terms.term() !=3D null) >> { >> System.out.println(terms.term().text()); >> } >> } >> while (terms.next()&& terms.term().field().equals(FIELD1)); >> } >> catch(Exception ex) >> { >> ex.printStackTrace(); >> } >> } >> >> } >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org Please consider the environment before printing this email. = This message should be regarded as confidential. If you have received thi= s email in error please notify the sender and destroy it immediately. = Statements of intent shall only become binding when confirmed in hard cop= y by an authorised signatory. = = The contents of this email may relate to dealings with other companies un= der the control of BAE Systems plc details of which can be found at http:= //www.baesystems.com/Businesses/index.htm. = Detica Limited is a BAE Systems company trading as BAE Systems Detica. Detica Limited is registered in England and Wales under No: 1337451. Registered office: Surrey Research Park, Guildford, Surrey, GU2 7YP, Engl= and. =0D --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org