lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Bet you didn't know Lucene can...
Date Wed, 26 Oct 2011 17:33:07 GMT
Yes, sure it is interesting -- github would be probably a good spot?

Dawid

On Wed, Oct 26, 2011 at 7:02 PM, mark harwood <markharw00d@yahoo.co.uk> wrote:
>>>  > Avg lookup time slightly less than a HashSet? Interesting.
>
> Scratch that. A new dataset and revised code shows HashSets out in front (but still
not a realistic option for very large sets) : http://goo.gl/Lb4J1
>
> In this benchmark I removed the code common to all previous tests which was first retrieving
a random key from a test query Lucene index to then look up in the target Set ( a choice of
database, hashset or a different Lucene index).
>
> I assumed that being common code to all tests, this initial Lucene-based fetch would
not bias results but it was. Now the tests first load a random sample of 100k keys from a
flat file *then* start the timer on the look-ups.
> I'm also using public domain Wikipedia data so can release the code and data somewhere
if that's of interest.
>
> Cheers
> Mark
>
>
>
> ----- Original Message -----
> From: Dawid Weiss <dawid.weiss@gmail.com>
> To: java-user@lucene.apache.org
> Cc:
> Sent: Tuesday, 25 October 2011, 23:17
> Subject: Re: Bet you didn't know Lucene can...
>
>> Lucene started out at an avg 3ms but subsequent runs took it down dramatically due
to OS file caching. The all-in-memory hashset implementation clearly did not demonstrate the
same speed ups between runs.
>
> I don't say the benchmark was wrong or anything, but this is
> surprising. I mean, the default HashSet impl. is a bucketed
> linked-list implementation. It made me wonder how the data was
> distributed. Even with OS file caching the in-memory data structure
> shouldn't fall short, at least intuitively.
>
>> I can make the code available but the data wouldn't be possible.
>> The English Wikipedia page titles are probably an equivalent size and shape so I
could try and package something up around that as a benchmarking tool for others to play with.
>
> If you find a spare cycle, it'd be great, thanks!
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message