lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanne Grinovero <>
Subject Re: A new Lucene Directory available
Date Sun, 15 Nov 2009 04:33:46 GMT
Hi John,
I didn't run a long running reliable benchmark, so at the moment I
can't really speak of numbers.
Suggestions and help on performance testing are welcome: I guess it
will shine in some situations, not necessarily all, so really choosing
a correct ratio of concurrent writers/searches, number of nodes in the
cluster and resources per node will never be fair enough to compare
this Directory with others.

On paper the premises are good: it's all in-memory, until it fits: it
will distribute data across nodes and overflow to disk is supported
(called passivation). A permanent store can be configured, so you
could set it to periodically flush incrementally to slower storages
like a database, a filesystem, a cloud storage service. This makes it
possible to avoid losing state even when all nodes are shut down.
A RAMDirectory is AFAIK not recommended as you could hit memory limits
and because it's basically a synchronized HashMap; Infinispan
implements ConcurrentHashMap and doesn't need synchronization.
Even if the data is replicated across nodes each node has it's own
local cache, so when caches are warm and all segments fit in memory it
should be, theoretically, the fastest Directory ever. The more it will
read from disk, the more it will behave similarly to a FSDirectory
with some buffers.

As per Lucene's design, writes can happen only at one node at a time:
one IndexWriter can own the lock, but IndexReaders and Searchers are
not blocked, so when using this Directory it should behave exactly as
if you had multiple processes sharing a local NIOFSdirectory:
basically the situation is that you can't scale on writers, but you
can scale near-linearly with readers adding in more power from more

Besides performance, the reasons to implement this was to be able to
easily add or remove processing power to a service (clouds), make it
easier to share indexes across nodes, and last but not least to remove
single points of failure: all data is distributed and there is no such
notion of Master: services will continue running fine when killing any

I hope this peeks your interest, sorry if I couldn't provide numbers.


On Sat, Nov 14, 2009 at 11:15 PM, John Wang <> wrote:
> HI Sanne:
>     Very interesting!
>     What kinda performance should we expect with this, comparing to regular
> FSDIrectory on local HD.
> Thanks
> -John
> On Sat, Nov 14, 2009 at 11:44 AM, Sanne Grinovero
> <> wrote:
>> Hello all,
>> I'm a Lucene user and fan, I wanted to tell you that we just released
>> a first technology preview of a distributed in memory Directory for
>> Lucene.
>> The release announcement:
>> From there you'll find links to the Wiki, to the sources, to the issue
>> tracker. A minimal demo is included with the sources.
>> This was developed together with Google Summer of Code student Lukasz
>> Moren and much support from the Infinispan and Hibernate Search teams,
>> as we are storing the index segments on Infinispan and using it's
>> atomic distributed locks to implement a Lucene LockFactory.
>> Initial idea was to contribute it directly to Lucene, but as
>> Infinispan is a LGPL dependency we had to distribute it with
>> Infinispan (as the other way around would have introduced some legal
>> issues); still we hope you appreciate the effort and are interested in
>> giving it a try.
>> All kind of feedback is welcome, especially on benchmarking
>> methodologies as I yet have to do some serious performance tests.
>> Main code, build with Maven2:
>> svn co
>> infinispan-directory
>> Demo, see the Readme:
>> svn co
>> lucene-demo
>> Best Regards,
>> Sanne
>> --
>> Sanne Grinovero
>> Sourcesense - making sense of Open  Source:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

Sanne Grinovero
Sourcesense - making sense of Open  Source:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message