lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanne Grinovero <>
Subject Re: A new Lucene Directory available
Date Sun, 15 Nov 2009 16:11:14 GMT
Hi again Earwin,
thanks you very much for spotting the byte reading issue, it's
definitely not as I wanted it.

I never tried to defend an improved updates/s ratio, just maybe
compared to scheduled rsyncs :-)
Our goal is to scale on queries/sec while usage semantics stays
unchanged, so you can open an IndexWriter as it was local to make
updates clusterwide. Very useful to cluster the many products already
using Lucene which are currently implementing exotic index management
workarounds or shared filesystems, as they weren't designed for it
from the beginning as SolR did.
I mentioned JIRA, you noticed how slow it can get on larger
deployments? because there's no way to deploy it clustered currently
(besides by using Terracotta), as it relies much on Lucene and index
changes need to be applied in real time.

About locking and jgroups.. please switch over to so you can get better answers and I
don't have to spam the Lucene developers.


On Sun, Nov 15, 2009 at 3:43 PM, Earwin Burrfoot <> wrote:
>> About the RAMDirectory comparison, as you said yourself the bytes
>> aren't read constantly but just at index reopen so I wouldn't be too
>> worried about the "bunch of methods" as they're executed once per
>> segment loading;
> The bytes /are/ read constantly (readByte() method). I believe that is
> the most innermost loop you can hope to find in Lucene.
>> A RAMDirectory is AFAIK not recommended as you could hit memory limits and because
it's basically a synchronized HashMap;
> On the other hand, just as I mentioned - the only access to said
> synchronized HashMap is done when you
> open InputStream on a file. That, unlike readByte(), happens rarely,
> as InputStreams are cloned after creation as needed.
> As for memory limits, your unbounded local cache hits them with same ease.
>> Instances of ChunkCacheKey are not created for each single byte read
>> but for each byte[] buffer, being the size of these buffers configurable.
> No, they are! :-)
>, rev. 1103:
> 120           public byte readByte() throws IOException {
> .........
> 132              buffer = getChunkFromPosition(cache, fileKey,
> filePosition, bufferSize);
> .........
> 141           }
> getChunkFromPosition() is called each time readByte() is invoked. It
> creates 1-2 instances of ChunkCacheKey.
>> This was decided after observations that it was
>> improving performance to "chunk" segments in smaller pieces rather
>> than have huge arrays of bytes, but if you like you can configure it
>> to degenerate to approach the one key per segment ratio.
> Locally, it's better not to chunk segments (unless you hit 2Gb
> barrier). When shuffling them over network - I can't say.
>> Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can scale :-)
> I'm just following two of your initial comparisons. And the only
> characteristic that can be scaled with such
> approach is queries/s. Index size - definetly not, updates/s - questionable.
>> About JGroups I'm not technically prepared for a match, but I've heard
>> of different stories of much bigger than 20 nodes business critical
>> clusters working very well. Sure, it won't scale without a proper
>> configuration at all levels: os, jgroups and infrastructure.
> The volume of messages travelling around, length of GC delays VS
> cluster size and messaging mode matter.
> They used reliable synchronous multicasts, so - once one node starts
> collecting, all others wait (or worse - send retries).
> Another one starts collecting, then another, partially delivered
> messages hold threads - caboom!
> How is locking handled here? With central broker it probably can work.
> --
> Kirill Zakharenko/Кирилл Захаренко (
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Sanne Grinovero
Sourcesense - making sense of Open  Source:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message