lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: Proposal - a high performance Key-Value store based on Lucene APIs/concepts
Date Sat, 24 Mar 2012 23:50:21 GMT

On Sat, Mar 24, 2012 at 4:17 PM, Mark Harwood <> wrote:
> OK I have some code and benchmarks for this solution up on a Google Code project here:
> The project exists to address the performance challenges I have encountered when dealing
with large graphs. It  uses all of the Wikipedia links as a test dataset and a choice of
graph databases (most of which use Lucene BTW).
> The test data is essentially 130 million edges representing links between pages e.g.
> To load the data all of the graph databases have to translate user-defined keys like
"Russia" into an internally-generated node ID using a service that looks like this:
>        interface KeyService
>        {
>                //Returns existing nodeid or -1 if is not already in store
>                public long getGraphNodeId(String udk);
>                //Adds a new record - assumption is client has checked user defined
key (udk) is not stored already using getGraphNodeId
>                public void put(String udk, long graphNodeId);
>        }
> This is a challenge on a dataset of this size. I tried using a Lucene-based implementation
for this service with the following optimisations:
> 1) a Bloomfilter to quickly "know what we don't know"
> 2) an LRUCache to hold on to commonly referenced vertices e.g the Wikipdedia article
for "United States"
> 3) a hashmap representing the unflushed state of Lucene's IndexWriter to avoid the need
for excessive flushing with NRT reader etc
> The search/write performance showed the familiar saw-toothing as the Lucene index grew
in size and merge operations kicked in.
> The KVStore implementation I wrote attempts to tackle this problem using a fundamentally
different form of index. The results from the KVStore runs show it was twice as fast as this
 Lucene solution and maintains constant performance without the saw toothing effect.
> Benchmark figures are here:
> The KVStore source code is here: and the Lucene implementation I
compare against is also in the project.
> Cheers
> Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Lance Norskog

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message