lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: HPPC: High Performance Primitive Collections for Java
Date Wed, 21 Apr 2010 10:18:04 GMT
I believe the big part of the speedup is due to HPPC's ability to
mutate Map values inplace, doing a single key lookup instead of two?

On Wed, Apr 21, 2010 at 13:56, Dawid Weiss <dawid.weiss@gmail.com> wrote:
> I have some cross-checks offline (fastutil, pcj, colt, trove). I
> didn't want to publish them because a great deal depends on the test
> case (micro-benchmark), CPU, JVM, system architecture, memory speed...
> you name it. This said, if you're curious, a simple bigram counting on
> real-data (using a hash map of int->int (char|char concatenation as
> key)). The results are similar with the growing number of rounds. Note
> "jcf" (java foundation classes): the "jcfWithHolder" is basically
> HashMap<Integer, MutableInteger>. Like I said -- in most benchmarks,
> built-in JVM data structures do very well.
>
> Dawid
>
> BigramCounting.hppc: [measured 5 out of 7 rounds]
>  round: 0.19 [+- 0.02], round.gc: 0.00 [+- 0.00], GC.calls: 0,
> GC.time: 0.00, time.total: 1.49, time.warmup: 0.56, time.bench: 0.93
> BigramCounting.trove: [measured 5 out of 7 rounds]
>  round: 0.64 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 0,
> GC.time: 0.00, time.total: 4.52, time.warmup: 1.32, time.bench: 3.21
> BigramCounting.fastutilOpenHashMap: [measured 5 out of 7 rounds]
>  round: 0.65 [+- 0.11], round.gc: 0.00 [+- 0.00], GC.calls: 0,
> GC.time: 0.00, time.total: 4.24, time.warmup: 1.00, time.bench: 3.24
> BigramCounting.fastutilLinkedOpenHashMap: [measured 5 out of 7 rounds]
>  round: 0.52 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 0,
> GC.time: 0.00, time.total: 3.44, time.warmup: 0.84, time.bench: 2.60
> BigramCounting.pcjOpenHashMap: [measured 5 out of 7 rounds]
>  round: 0.60 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 1,
> GC.time: 0.01, time.total: 4.16, time.warmup: 1.17, time.bench: 2.99
> BigramCounting.pcjChainedHashMap: [measured 5 out of 7 rounds]
>  round: 0.47 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0,
> GC.time: 0.00, time.total: 3.41, time.warmup: 1.06, time.bench: 2.35
> BigramCounting.jcf: [measured 5 out of 7 rounds]
>  round: 0.70 [+- 0.05], round.gc: 0.00 [+- 0.00], GC.calls: 8,
> GC.time: 0.02, time.total: 5.44, time.warmup: 1.95, time.bench: 3.50
> BigramCounting.jcfWithHolder: [measured 5 out of 7 rounds]
>  round: 0.30 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 4,
> GC.time: 0.01, time.total: 2.21, time.warmup: 0.73, time.bench: 1.48
>
>
> On Wed, Apr 21, 2010 at 10:10 AM, John Wang <john.wang@gmail.com> wrote:
>> Hi Dawid:
>>
>>      Any performance comparisons with fastutil?
>>
>> Thanks
>>
>> -John
>>
>> On Mon, Apr 19, 2010 at 1:11 PM, Dawid Weiss <dawid.weiss@gmail.com> wrote:
>>>
>>> > Hmmm.. can anybody compare these to fastutil?
>>>
>>> I believe I can answer some of your questions.
>>>
>>> 1) HPPC is not directly Java Collections-compatible. It does have
>>> interface hierarchy, but it's not a descendant of the familiar Set,
>>> Map or List. Fastutil is collections-compatible.
>>>
>>> 2) HPPC has open internals, so you can do anything you like once your
>>> collections are created, including manipulation of internal storage
>>> arrays, for instance. This was a design decision and goal. As with any
>>> sharp objects, improper use may cause harm.
>>>
>>> 3) HPPC uses assert instead of fixed condition checks. There are no
>>> attempts to detect misuse (fail-fast iterators, etc.).
>>>
>>> 4) fastutil is more mature, has support for more data structures
>>> (sorted trees, etc.) and was written by an excellent programmer
>>> (Sebastiano Vigna). HPPC was created internally for use at Carrot
>>> Search and was primarily motivated by speed; we believed that in
>>> certain applications direct access to collections' internals should be
>>> allowed and should be beneficial. Our micro-benchmarks show that this
>>> is largerly true if you manipulate LOTS of data. For smaller data sets
>>> even built-in Java collections with boxed types do surprisingly well
>>> (due to HotSpot optimizations too).
>>>
>>> 5) There are subtle differences in how HPPC is written -- I use pretty
>>> much normal generic classes with some pseudo-intrinsics and
>>> regexp-substituted comments. Sebastiano uses C++ preprocessor to
>>> generate Java classes from templates (yes, wicked).
>>>
>>> I look at Lucene and SOLR source code and learn a LOT from folks
>>> contributing to this project, so HPPC will be hardly any faster or
>>> better compared to what Lucene already has, but if anybody find
>>> anything from HPPC useful, please take handfuls. I would love for this
>>> project to be finally merged with Mahout, but I intentially left it in
>>> Carrot Search labs for a little while so that the API can stabilize
>>> (through our in-house experiments mostly).
>>>
>>> Thanks for showing your interest!
>>> Dawid
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message