lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject Re: A new Lucene Directory available
Date Sun, 15 Nov 2009 14:43:45 GMT
> About the RAMDirectory comparison, as you said yourself the bytes
> aren't read constantly but just at index reopen so I wouldn't be too
> worried about the "bunch of methods" as they're executed once per
> segment loading;
The bytes /are/ read constantly (readByte() method). I believe that is
the most innermost loop you can hope to find in Lucene.

> A RAMDirectory is AFAIK not recommended as you could hit memory limits and because it's
basically a synchronized HashMap;
On the other hand, just as I mentioned - the only access to said
synchronized HashMap is done when you
open InputStream on a file. That, unlike readByte(), happens rarely,
as InputStreams are cloned after creation as needed.
As for memory limits, your unbounded local cache hits them with same ease.

> Instances of ChunkCacheKey are not created for each single byte read
> but for each byte[] buffer, being the size of these buffers configurable.
No, they are! :-), rev. 1103:
120           public byte readByte() throws IOException {
132 	         buffer = getChunkFromPosition(cache, fileKey,
filePosition, bufferSize);
141 	      }
getChunkFromPosition() is called each time readByte() is invoked. It
creates 1-2 instances of ChunkCacheKey.

> This was decided after observations that it was
> improving performance to "chunk" segments in smaller pieces rather
> than have huge arrays of bytes, but if you like you can configure it
> to degenerate to approach the one key per segment ratio.
Locally, it's better not to chunk segments (unless you hit 2Gb
barrier). When shuffling them over network - I can't say.

> Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can scale :-)
I'm just following two of your initial comparisons. And the only
characteristic that can be scaled with such
approach is queries/s. Index size - definetly not, updates/s - questionable.

> About JGroups I'm not technically prepared for a match, but I've heard
> of different stories of much bigger than 20 nodes business critical
> clusters working very well. Sure, it won't scale without a proper
> configuration at all levels: os, jgroups and infrastructure.
The volume of messages travelling around, length of GC delays VS
cluster size and messaging mode matter.
They used reliable synchronous multicasts, so - once one node starts
collecting, all others wait (or worse - send retries).
Another one starts collecting, then another, partially delivered
messages hold threads - caboom!
How is locking handled here? With central broker it probably can work.

Kirill Zakharenko/Кирилл Захаренко (
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message