lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Incze Lajos <in...@mail.matav.hu>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/index MultiReader.java FilterIndexReader.java IndexReader.java SegmentReader.java
Date Tue, 20 Apr 2004 01:14:22 GMT
On Mon, Apr 19, 2004 at 01:34:13PM -0700, Doug Cutting wrote:
> Christoph Goller wrote:
> >Concerning close, I would like to give a similar behavior to 
> >IndexWriter. It
> >should only close the directory if it was explicitly opened for it. 
> >Would this be ok?
> 
> Yes.  I think someone complained about this recently.  That would be a 
> good fix.  Thanks.
> 
> >I am currently racking my brain on the skipTo stuff. I hope to get it right
> >tomorrow. I didnĀ“t do the proposed changes on the file format so far.
> >However, I meditated upon SegmentTermDocs.skipTo. Its really a headache :-)
> 
> It is surprisingly complicated!  I wish it wasn't so...

I'm putting my findings here, as seems to me related. In a mid size
corpora I've found the following mistery:

1) +SZIDO:"jan 1"                                    -- 92 hits
2) +SZIDO:"jan 1" +TYPE:ER-CIKK                      -- 433 hits
3) +SZIDO:"jan 1" +TYPE:ER-CIKK NONSENSE:nonsense    -- 92 hits

2) is obviously a nonsense. The NONSENSE field in the 3rd query
does not exists. Altough I do not understand what's happening,
and couldn't produce a revealing test case, I've found that if
I switch off the ConjunctionScorer optimization (the same way
as the 3rd query switched it off) by inserting

///////////////////////////////////////////////////////////////////
      allRequired = false;
///////////////////////////////////////////////////////////////////
      if (allRequired && noneBoolean) {           // ConjunctionScorer is okay

this bug disappears. Also, found that (at least for me) only the
PhraseQuery produces this result. If I change the 2nd query with

2A) +SZIDO:(+jan +1) +TYPE:ER-CIKK

I gain the (good) 92 hits result. I'm almost sure that there is something
wrong with the document order and skipto what is specific to the
PhraseQuery.

incze

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message