lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: potential indexing perormance improvement for compound index - cut IO - have more files though
Date Fri, 15 Dec 2006 22:04:41 GMT
I think Doron is right on the money here.  I know one "customer" who'd be happy to trade its
file descriptors for less IO -  It's exactly what Doron describes - a busy system
with a LOT of indices.  File descriptors are kept under control by carefully closing IndexSearchers,
plus I can always increase the max open-files limit.  What I can't easily increase is the
disk IO.  Sure, I could go from CFS to the multi-file format, but it would be nice to have
that third, middle ground choice.


----- Original Message ----
From: Doron Cohen <>
Sent: Friday, December 15, 2006 2:55:41 PM
Subject: Re: potential indexing perormance improvement for compound index - cut IO - have
more files though

"Mike Klaas" <> wrote:
> My main comment is that the benefits of this change can be achieved by
> using the non-compound index format.  For people that care about the
> difference in performance, it isn't difficult to configure your system
> to mitigate the problems of the non-compound format, and they probably
> have already done so.
> It would help the people who are file-descriptor conscious, but it
> also increases lucene's fd footprint by a factor of four.

That's right - people worried about indexing performance can easily apply

My guess though is that most people just keep the default setting.

Large systems that maintain many indexes, would be worried about the number
of file descriptors and would use compound format. But it is not clear to
me what would be the preference in such systems - four times the file
descriptors, or twice as much the IO?  If such a third choice is supported
- "semmi compound" - how many systems would {be able to / choose to} use
it? Depending on the specific system maybe.

I verified the IO factor, by counting bytes read in
FSIndexInput.readInternal(byte[],int,int) and written in

 round  vect  stor cmpnd   runCnt   recsPerRun  rec/s  elapsedSec    write
     0  true  true  true        1       100000  153.4      651.74    2 GB
1.9 GB
 -   1  true  true false -  -   1 -  -  100000  169.5 -  - 589.82 -  1 GB
0.9 GB
     2 false false  true        1       100000  151.4      660.41    2 GB
1.9 GB
 -   3 false false false -  -   1 -  -  100000  168.0 -  - 595.37 -  1 GB
0.9 GB

Indeed, there is a factor of two for both read bytes and written bytes.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message