lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Harwood <markharw...@yahoo.co.uk>
Subject Re: Bet you didn't know Lucene can...
Date Tue, 25 Oct 2011 22:08:36 GMT
> Avg lookup time slightly less than a HashSet? Interesting.

Yep, HashSet comparison was a surprise to me too. I threw it in as a datapoint for what I
thought would be the fastest option on the example dataset but clearly not a long-term answer
to my problem as it costs so much in RAM. 
Lucene started out at an avg 3ms but subsequent runs took it down dramatically due to OS file
caching. The all-in-memory hashset implementation clearly did not demonstrate the same speed
ups between runs.

> Is the code
> to these benchmarks available somewhere?


I can make the code available but the data wouldn't be possible.
The English Wikipedia page titles are probably an equivalent size and shape so I could try
and package something up around that as a benchmarking tool for others to play with. 

Cheers
Mark

On 25 Oct 2011, at 22:47, Dawid Weiss wrote:

> Avg lookup time slightly less than a HashSet? Interesting. Is the code
> to these benchmarks available somewhere?
> 
> Dawid
> 
> On Tue, Oct 25, 2011 at 9:57 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>> 
>> On Oct 25, 2011, at 11:26 AM, mark harwood wrote:
>> 
>>>>> using Lucene that don't fit under the core premise of full text search
>>> 
>>>  I've had several use cases over the years that use features peculiar to Lucene
but here's a very simple one I came across today that illustrates its raw index lookup capability:
>>> 
>>> I needed a fast, scalable and persistent "Set" implementation to maintain a large
cold-list (millions of string-based keys).
>>> I benchmarked various implementations using a set of ~6 million keys with 10,000
random key lookups.
>>> When it comes to RAM use, retrieval times and start-up costs Lucene stands up
very well against equivalent embedded databases for this task:
>>> 
>>> * Benchmarks for times to initially open the set when stored on disk:  http://goo.gl/dJL3g
>>> * Benchmarks for Avg key lookup time once opened: http://goo.gl/SG79N
>>> * Stats for RAM use after 10,000 lookups: http://goo.gl/MyJDn
>> 
>> Those charts are beautiful.  I have Lucene/Solr down as an excellent key-value store
(I've seen this done many times) and these charts further cement it.
>> 
>>> 
>>> I don't doubt all of these implementations could be tweaked (e.g. optimizing
the Lucene index, various DB-specific settings) but I tried to use sensible defaults to make
the tests fair e.g. use of prepared statements, indexes, minimal data retrieved.
>>> Speeds varied with each run of the random lookup test due to OS-level caching
effects so the best times were recorded in each case.
>>> The HashSet tests are loaded entirely from file (hence the long start-up time)
and are not a scalable solution because of RAM costs.
>>> MySQL requires an inter-process call as it was not  embedded but even using a
remoted Lucene call I get significantly better performance (avg 0.5ms lookup vs MySQL 10ms)
>>> 
>>> 
>>> Cheers
>>> Mark
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Grant Ingersoll <gsingers@apache.org>
>>> To: java-user@lucene.apache.org
>>> Cc:
>>> Sent: Saturday, 22 October 2011, 10:11
>>> Subject: Bet you didn't know Lucene can...
>>> 
>>> Hi All,
>>> 
>>> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." (http://na11.apachecon.com/talks/18396).
 It's based on my observation, that over the years, a number of us in the community have done
some pretty cool things using Lucene that don't fit under the core premise of full text search.
 I've got a fair number of ideas for the talk (easily enough for 1 hour), but I wanted to
reach out to hear your stories of ways you've (ab)used Lucene and Solr to see if we couldn't
extend the conversation to a bit more than the conference and also see if I can't inject more
ideas beyond the ones I have.  I don't need deep technical details, but just high level use
case and the basic insight that led you to believe Lucene could solve the problem.
>>> 
>>> Thanks in advance,
>>> Grant
>>> 
>>> --------------------------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> 
>> 
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message