lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: HPPC: High Performance Primitive Collections for Java
Date Wed, 21 Apr 2010 09:56:28 GMT
I have some cross-checks offline (fastutil, pcj, colt, trove). I
didn't want to publish them because a great deal depends on the test
case (micro-benchmark), CPU, JVM, system architecture, memory speed...
you name it. This said, if you're curious, a simple bigram counting on
real-data (using a hash map of int->int (char|char concatenation as
key)). The results are similar with the growing number of rounds. Note
"jcf" (java foundation classes): the "jcfWithHolder" is basically
HashMap<Integer, MutableInteger>. Like I said -- in most benchmarks,
built-in JVM data structures do very well.

Dawid

BigramCounting.hppc: [measured 5 out of 7 rounds]
 round: 0.19 [+- 0.02], round.gc: 0.00 [+- 0.00], GC.calls: 0,
GC.time: 0.00, time.total: 1.49, time.warmup: 0.56, time.bench: 0.93
BigramCounting.trove: [measured 5 out of 7 rounds]
 round: 0.64 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 0,
GC.time: 0.00, time.total: 4.52, time.warmup: 1.32, time.bench: 3.21
BigramCounting.fastutilOpenHashMap: [measured 5 out of 7 rounds]
 round: 0.65 [+- 0.11], round.gc: 0.00 [+- 0.00], GC.calls: 0,
GC.time: 0.00, time.total: 4.24, time.warmup: 1.00, time.bench: 3.24
BigramCounting.fastutilLinkedOpenHashMap: [measured 5 out of 7 rounds]
 round: 0.52 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 0,
GC.time: 0.00, time.total: 3.44, time.warmup: 0.84, time.bench: 2.60
BigramCounting.pcjOpenHashMap: [measured 5 out of 7 rounds]
 round: 0.60 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 1,
GC.time: 0.01, time.total: 4.16, time.warmup: 1.17, time.bench: 2.99
BigramCounting.pcjChainedHashMap: [measured 5 out of 7 rounds]
 round: 0.47 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0,
GC.time: 0.00, time.total: 3.41, time.warmup: 1.06, time.bench: 2.35
BigramCounting.jcf: [measured 5 out of 7 rounds]
 round: 0.70 [+- 0.05], round.gc: 0.00 [+- 0.00], GC.calls: 8,
GC.time: 0.02, time.total: 5.44, time.warmup: 1.95, time.bench: 3.50
BigramCounting.jcfWithHolder: [measured 5 out of 7 rounds]
 round: 0.30 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 4,
GC.time: 0.01, time.total: 2.21, time.warmup: 0.73, time.bench: 1.48


On Wed, Apr 21, 2010 at 10:10 AM, John Wang <john.wang@gmail.com> wrote:
> Hi Dawid:
>
>      Any performance comparisons with fastutil?
>
> Thanks
>
> -John
>
> On Mon, Apr 19, 2010 at 1:11 PM, Dawid Weiss <dawid.weiss@gmail.com> wrote:
>>
>> > Hmmm.. can anybody compare these to fastutil?
>>
>> I believe I can answer some of your questions.
>>
>> 1) HPPC is not directly Java Collections-compatible. It does have
>> interface hierarchy, but it's not a descendant of the familiar Set,
>> Map or List. Fastutil is collections-compatible.
>>
>> 2) HPPC has open internals, so you can do anything you like once your
>> collections are created, including manipulation of internal storage
>> arrays, for instance. This was a design decision and goal. As with any
>> sharp objects, improper use may cause harm.
>>
>> 3) HPPC uses assert instead of fixed condition checks. There are no
>> attempts to detect misuse (fail-fast iterators, etc.).
>>
>> 4) fastutil is more mature, has support for more data structures
>> (sorted trees, etc.) and was written by an excellent programmer
>> (Sebastiano Vigna). HPPC was created internally for use at Carrot
>> Search and was primarily motivated by speed; we believed that in
>> certain applications direct access to collections' internals should be
>> allowed and should be beneficial. Our micro-benchmarks show that this
>> is largerly true if you manipulate LOTS of data. For smaller data sets
>> even built-in Java collections with boxed types do surprisingly well
>> (due to HotSpot optimizations too).
>>
>> 5) There are subtle differences in how HPPC is written -- I use pretty
>> much normal generic classes with some pseudo-intrinsics and
>> regexp-substituted comments. Sebastiano uses C++ preprocessor to
>> generate Java classes from templates (yes, wicked).
>>
>> I look at Lucene and SOLR source code and learn a LOT from folks
>> contributing to this project, so HPPC will be hardly any faster or
>> better compared to what Lucene already has, but if anybody find
>> anything from HPPC useful, please take handfuls. I would love for this
>> project to be finally merged with Mahout, but I intentially left it in
>> Carrot Search labs for a little while so that the API can stabilize
>> (through our in-house experiments mostly).
>>
>> Thanks for showing your interest!
>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message