lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Bet you didn't know Lucene can...
Date Tue, 25 Oct 2011 21:47:07 GMT
Avg lookup time slightly less than a HashSet? Interesting. Is the code
to these benchmarks available somewhere?


On Tue, Oct 25, 2011 at 9:57 PM, Grant Ingersoll <> wrote:
> On Oct 25, 2011, at 11:26 AM, mark harwood wrote:
>>>> using Lucene that don't fit under the core premise of full text search
>>  I've had several use cases over the years that use features peculiar to Lucene
but here's a very simple one I came across today that illustrates its raw index lookup capability:
>> I needed a fast, scalable and persistent "Set" implementation to maintain a large
cold-list (millions of string-based keys).
>> I benchmarked various implementations using a set of ~6 million keys with 10,000
random key lookups.
>> When it comes to RAM use, retrieval times and start-up costs Lucene stands up very
well against equivalent embedded databases for this task:
>> * Benchmarks for times to initially open the set when stored on disk:
>> * Benchmarks for Avg key lookup time once opened:
>> * Stats for RAM use after 10,000 lookups:
> Those charts are beautiful.  I have Lucene/Solr down as an excellent key-value store
(I've seen this done many times) and these charts further cement it.
>> I don't doubt all of these implementations could be tweaked (e.g. optimizing the
Lucene index, various DB-specific settings) but I tried to use sensible defaults to make the
tests fair e.g. use of prepared statements, indexes, minimal data retrieved.
>> Speeds varied with each run of the random lookup test due to OS-level caching effects
so the best times were recorded in each case.
>> The HashSet tests are loaded entirely from file (hence the long start-up time) and
are not a scalable solution because of RAM costs.
>> MySQL requires an inter-process call as it was not  embedded but even using a remoted
Lucene call I get significantly better performance (avg 0.5ms lookup vs MySQL 10ms)
>> Cheers
>> Mark
>> ----- Original Message -----
>> From: Grant Ingersoll <>
>> To:
>> Cc:
>> Sent: Saturday, 22 October 2011, 10:11
>> Subject: Bet you didn't know Lucene can...
>> Hi All,
>> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." (
 It's based on my observation, that over the years, a number of us in the community have
done some pretty cool things using Lucene that don't fit under the core premise of full text
search.  I've got a fair number of ideas for the talk (easily enough for 1 hour), but I wanted
to reach out to hear your stories of ways you've (ab)used Lucene and Solr to see if we couldn't
extend the conversation to a bit more than the conference and also see if I can't inject more
ideas beyond the ones I have.  I don't need deep technical details, but just high level use
case and the basic insight that led you to believe Lucene could solve the problem.
>> Thanks in advance,
>> Grant
>> --------------------------------------------
>> Grant Ingersoll
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --------------------------------------------
> Grant Ingersoll

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message