lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "info axinews" <>
Subject Re: lucene benchmark and profiling
Date Fri, 16 May 2003 22:54:13 GMT
Thanks for your response Doug (very honored)

> Is your index optimized?  If it is not, then this is not surprising.

Yes my index is optimized. I thought  it was not indexed because the
performance was not so better than I hope (compared with my little index
more fast) but it was optimized.

> An unoptimized index consists of a set of inverted indexes, each called
> a segment.  Each segment has a term dictionary, contained in the .tii
> and .tis files.  The .tii is read entirely into memory and tells where
> to seek in the .tis file.  An average of 64 terms in the .tis file must
> then be scanned to find the requested term.  If the average term entry
> is around 10 bytes long, then this would result in 640 bytes read per
> query term per segment, regardless of whether the term exists in the
> index.  If it does exist, then the .frq and (in the case of a phrase
> query) the .prx file must also be read.

It's more clear. Therefor, if I understand, for each search, there
nevertheless a quantity of some byte who are read, that's the reason why
search are not immediat when a word is not in the index. It is indicate that
.tii file is stored in RAM but what I want to know is if the totality of
each "TermInfo" (stored in .tii) are recalculate, and then explain the
reason for what when the index is growing, the byte operation is growing too
(that's normal you could say but less normal when a search is in a word who
are not indexed)
I want therefore to know if when my index will growing (4 Gb for the moment
but it will be soon 40 Gb), the search time will growing in the same
proportion (then 10 time more long).If I understand what you said, the
search time for a word not indexed must be the same and not depending to
index size.
You could test my lucene integration :  (not a commercial website)
and you could think "this guy is stupide" because the search result is very
fast (and I want to thanks all the developper who make it possible), the
reason is because I use now IBM VM and the result is now 3 To 4 time more
faster. But in the future with 40 Gb index, always so fast?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message