Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 54797 invoked from network); 27 Sep 2002 22:22:46 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 27 Sep 2002 22:22:46 -0000 Received: (qmail 16638 invoked by uid 97); 27 Sep 2002 22:23:30 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 16585 invoked by uid 97); 27 Sep 2002 22:23:29 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 16566 invoked by uid 98); 27 Sep 2002 22:23:28 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Reply-To: From: "Alex Murzaku" To: "'Lucene Users List'" Subject: RE: Problems with exact matces on non-tokenized fields... Date: Fri, 27 Sep 2002 18:22:15 -0400 Organization: LISSUS llc Message-ID: <003801c26674$6769c660$1401000a@Lissus> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4024 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal In-Reply-To: <3D94A248.2050509@lucene.com> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Thanks! Now that I think of it, I was searching in the documentation for a method to reset the document 'd' to "empty" once it is indexed so that it could be reused but I didn't find one and then the bug slipped through. I was afraid that all these objects might not be garbage collected in time. In a test much smaller than infinite: for (i=0; i<=100000000; i++) { Document d = new Document(); d.add(Field.Keyword("nr", Integer.toString(i))); d.add(Field.Keyword("element","POST")); writer.addDocument(d); } I got very soon java.lang.OutOfMemoryError but, by just forcing garbage collection at the end of the cycle, the memory usage is now a very flat line... Sorry for bothering you. -----Original Message----- From: Doug Cutting [mailto:cutting@lucene.com] Sent: Friday, September 27, 2002 2:24 PM To: Lucene Users List Subject: Re: Problems with exact matces on non-tokenized fields... lex Murzaku wrote: > I was trying this as well but now I get something I can't understand: > My query (Query: +element:POST +nr:3) is supposed to match only one > record. Indeed Lucene returns that record with the highest score but > it also returns others that shouldn't be there at all even if it was > an OR query. Another observation: it returns all records where "nr" >= > 3. Notice the last record returned doesn't contain neither "POST" nor > "3". I am attaching a self contained running example with this problem > and would appreciate any comment. > > 0.6869936 Keyword Keyword > 0.63916886 Keyword Keyword > 0.6044586 Keyword Keyword > 0.5773442 Keyword Keyword > 0.56318253 Keyword Keyword > 0.54449975 Keyword Keyword > 0.5247468 Keyword Keyword > 0.45054603 Keyword Keyword Phew! It took me a while to spot this one... The bug is with your test program. You keep adding fields to the same document instance. If you change your program to print the entire document, you'll see: Query: +element:POST +nr:3 0.6869936 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword> 0.63916886 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword> 0.6044586 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword> 0.5773442 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword> 0.56318253 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword> 0.54449975 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword> 0.5247468 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword> 0.45054603 Document Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword Keyword> So you need to create a new document instance each time. I've attached a modified version of your test program that does this and gives the results you desire: Query: +element:POST +nr:3 1.0 Document Keyword> Doug -- To unsubscribe, e-mail: For additional commands, e-mail: