lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject PATCH: SegmentsReader/SegmentsTermEnum
Date Wed, 03 Sep 2003 13:58:55 GMT
Hi Lucene Developers,

first let me thank you all for this excellent peace of software
that you created. I am using Lucene in several projects and I
am currently also building more enhanced text mining applications
on top of it. Because of that I have spent a lot of time studying
the Lucene sources and I will come up with a couple of proposals
for bug fixes in the next days. Here is the first one:

I think I can fix a bug in SegmentsTermEnum.
One can create a TermEnum from an IndexReader in two ways:

indexReader.terms()
indexReader.terms(t)

If one gets a TermEnum starting at a specified term t one does not
have to call enum.next() before using it. The enum is valid from the
beginning.Calling enum.next() switches to the next term. However, this
bahaviour is only true if our index consists of only one segment. If we
have an index consisting of several segments term t is delivered twice,
1st time after calling indexReader.terms(t); enum.term(), 2nd time after
calling enum.next(). Furthermore the initial document frequency might
be false (if t occurs in more than one segment). The problem can be
fixed by calling next() in the constructor of SegmentsTermEnum.
I attach a test that demonstrates the problem and a patch that fixes it.

kind regards,
Christoph

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 M√ľnchen, Germany     Email:  goller@detego-software.de  *
*****************************************************************

Mime
View raw message