lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Murzaku" <li...@lissus.com>
Subject RE: Problems with exact matces on non-tokenized fields...
Date Fri, 27 Sep 2002 22:22:15 GMT
Thanks! Now that I think of it, I was searching in the documentation for
a method to reset the document 'd' to "empty" once it is indexed so that
it could be reused but I didn't find one and then the bug slipped
through. I was afraid that all these objects might not be garbage
collected in time. In a test much smaller than infinite:
        for (i=0; i<=100000000; i++) {
            Document d = new Document();
            d.add(Field.Keyword("nr", Integer.toString(i)));
            d.add(Field.Keyword("element","POST"));
            writer.addDocument(d);
        }
I got very soon java.lang.OutOfMemoryError but, by just forcing garbage
collection at the end of the cycle, the memory usage is now a very flat
line... Sorry for bothering you.

-----Original Message-----
From: Doug Cutting [mailto:cutting@lucene.com] 
Sent: Friday, September 27, 2002 2:24 PM
To: Lucene Users List
Subject: Re: Problems with exact matces on non-tokenized fields...


lex Murzaku wrote:
> I was trying this as well but now I get something I can't understand: 
> My query (Query: +element:POST +nr:3) is supposed to match only one 
> record. Indeed Lucene returns that record with the highest score but 
> it also returns others that shouldn't be there at all even if it was 
> an OR query. Another observation: it returns all records where "nr" >=

> 3. Notice the last record returned doesn't contain neither "POST" nor 
> "3". I am attaching a self contained running example with this problem

> and would appreciate any comment.
>  
> 0.6869936 Keyword<nr:3> Keyword<element:POST>
> 0.63916886 Keyword<nr:4> Keyword<element:POST>
> 0.6044586 Keyword<nr:6> Keyword<element:POST>
> 0.5773442 Keyword<nr:5> Keyword<element:POST>
> 0.56318253 Keyword<nr:9> Keyword<element:POST>
> 0.54449975 Keyword<nr:8> Keyword<element:POST>
> 0.5247468 Keyword<nr:7> Keyword<element:POST>
> 0.45054603 Keyword<nr:10> Keyword<element:GET>

Phew!  It took me a while to spot this one...

The bug is with your test program.  You keep adding fields to the same 
document instance.  If you change your program to print the entire 
document, you'll see:

Query: +element:POST +nr:3
0.6869936 Document<Keyword<element:POST> Keyword<nr:3> 
Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>

Keyword<element:POST> Keyword<nr:0>>
0.63916886 Document<Keyword<element:POST> Keyword<nr:4> 
Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>

Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>
0.6044586 Document<Keyword<element:POST> Keyword<nr:6> 
Keyword<element:POST> Keyword<nr:5> Keyword<element:POST> Keyword<nr:4>

Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>

Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>
0.5773442 Document<Keyword<element:POST> Keyword<nr:5> 
Keyword<element:POST> Keyword<nr:4> Keyword<element:POST> Keyword<nr:3>

Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>

Keyword<element:POST> Keyword<nr:0>>
0.56318253 Document<Keyword<element:POST> Keyword<nr:9> 
Keyword<element:POST> Keyword<nr:8> Keyword<element:POST> Keyword<nr:7>

Keyword<element:POST> Keyword<nr:6> Keyword<element:POST> Keyword<nr:5>

Keyword<element:POST> Keyword<nr:4> Keyword<element:POST> Keyword<nr:3>

Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>

Keyword<element:POST> Keyword<nr:0>>
0.54449975 Document<Keyword<element:POST> Keyword<nr:8> 
Keyword<element:POST> Keyword<nr:7> Keyword<element:POST> Keyword<nr:6>

Keyword<element:POST> Keyword<nr:5> Keyword<element:POST> Keyword<nr:4>

Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>

Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>
0.5247468 Document<Keyword<element:POST> Keyword<nr:7> 
Keyword<element:POST> Keyword<nr:6> Keyword<element:POST> Keyword<nr:5>

Keyword<element:POST> Keyword<nr:4> Keyword<element:POST> Keyword<nr:3>

Keyword<element:POST> Keyword<nr:2> Keyword<element:POST> Keyword<nr:1>

Keyword<element:POST> Keyword<nr:0>>
0.45054603 Document<Keyword<element:GET> Keyword<nr:10> 
Keyword<element:POST> Keyword<nr:9> Keyword<element:POST> Keyword<nr:8>

Keyword<element:POST> Keyword<nr:7> Keyword<element:POST> Keyword<nr:6>

Keyword<element:POST> Keyword<nr:5> Keyword<element:POST> Keyword<nr:4>

Keyword<element:POST> Keyword<nr:3> Keyword<element:POST> Keyword<nr:2>

Keyword<element:POST> Keyword<nr:1> Keyword<element:POST> Keyword<nr:0>>

So you need to create a new document instance each time.  I've attached 
a modified version of your test program that does this and gives the 
results you desire:

Query: +element:POST +nr:3
1.0 Document<Keyword<element:POST> Keyword<nr:3>>

Doug


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message