lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Bet you didn't know Lucene can...
Date Wed, 26 Oct 2011 17:02:28 GMT
>>  > Avg lookup time slightly less than a HashSet? Interesting.

Scratch that. A new dataset and revised code shows HashSets out in front (but still not a
realistic option for very large sets) :

In this benchmark I removed the code common to all previous tests which was first retrieving
a random key from a test query Lucene index to then look up in the target Set ( a choice of
database, hashset or a different Lucene index). 

I assumed that being common code to all tests, this initial Lucene-based fetch would not bias
results but it was. Now the tests first load a random sample of 100k keys from a flat file
*then* start the timer on the look-ups.
I'm also using public domain Wikipedia data so can release the code and data somewhere if
that's of interest.


----- Original Message -----
From: Dawid Weiss <>
Sent: Tuesday, 25 October 2011, 23:17
Subject: Re: Bet you didn't know Lucene can...

> Lucene started out at an avg 3ms but subsequent runs took it down dramatically due to
OS file caching. The all-in-memory hashset implementation clearly did not demonstrate the
same speed ups between runs.

I don't say the benchmark was wrong or anything, but this is
surprising. I mean, the default HashSet impl. is a bucketed
linked-list implementation. It made me wonder how the data was
distributed. Even with OS file caching the in-memory data structure
shouldn't fall short, at least intuitively.

> I can make the code available but the data wouldn't be possible.
> The English Wikipedia page titles are probably an equivalent size and shape so I could
try and package something up around that as a benchmarking tool for others to play with.

If you find a spare cycle, it'd be great, thanks!


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message