Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 37983 invoked from network); 21 Apr 2010 08:10:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Apr 2010 08:10:57 -0000 Received: (qmail 88800 invoked by uid 500); 21 Apr 2010 08:10:55 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 88369 invoked by uid 500); 21 Apr 2010 08:10:53 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 88357 invoked by uid 99); 21 Apr 2010 08:10:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 08:10:52 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of john.wang@gmail.com designates 209.85.221.184 as permitted sender) Received: from [209.85.221.184] (HELO mail-qy0-f184.google.com) (209.85.221.184) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 08:10:46 +0000 Received: by qyk14 with SMTP id 14so3779710qyk.14 for ; Wed, 21 Apr 2010 01:10:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=Rumvym4LSLOAbKakLsQ98w0yurxoTx51K+0CMm04soo=; b=QgZlspJEgNSKqdWDOUwnE39Z1sd4z4L8Bjl4Jf0R8bIMuNt2pqJ/HX7mVGeKGOl3iE WLrvv+YP+xE998m0OJQvo/2nSaH2C6NlvFm+1yafkUkMXZWiJioYY2JiUmynj6TMX1ka TEnoHbdDx/SZyHib7twge6CZJ/UID3Gthtf/8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Kq0+SS6OF1ROx7tHJSvYSS8yS2rPLtGdinmgVg6oUv9GlGW3o4+3QByLzTiwiOQPmo 16uxofY1Ym/uVWCKq/mKdqBab8RQz1GPlnz6IL1/xpvLlt5gNYBSJGEJbee3vouO1ntK XfApAuXB0l9IBCKldHeK6G0KBBzO+YgCAP2SI= MIME-Version: 1.0 Received: by 10.229.228.148 with HTTP; Wed, 21 Apr 2010 01:10:25 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Apr 2010 01:10:25 -0700 Received: by 10.229.217.148 with SMTP id hm20mr286352qcb.38.1271837425594; Wed, 21 Apr 2010 01:10:25 -0700 (PDT) Message-ID: Subject: Re: HPPC: High Performance Primitive Collections for Java From: John Wang To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=00163630fe27f6f3470484bab9a8 --00163630fe27f6f3470484bab9a8 Content-Type: text/plain; charset=ISO-8859-1 Hi Dawid: Any performance comparisons with fastutil? Thanks -John On Mon, Apr 19, 2010 at 1:11 PM, Dawid Weiss wrote: > > Hmmm.. can anybody compare these to fastutil? > > I believe I can answer some of your questions. > > 1) HPPC is not directly Java Collections-compatible. It does have > interface hierarchy, but it's not a descendant of the familiar Set, > Map or List. Fastutil is collections-compatible. > > 2) HPPC has open internals, so you can do anything you like once your > collections are created, including manipulation of internal storage > arrays, for instance. This was a design decision and goal. As with any > sharp objects, improper use may cause harm. > > 3) HPPC uses assert instead of fixed condition checks. There are no > attempts to detect misuse (fail-fast iterators, etc.). > > 4) fastutil is more mature, has support for more data structures > (sorted trees, etc.) and was written by an excellent programmer > (Sebastiano Vigna). HPPC was created internally for use at Carrot > Search and was primarily motivated by speed; we believed that in > certain applications direct access to collections' internals should be > allowed and should be beneficial. Our micro-benchmarks show that this > is largerly true if you manipulate LOTS of data. For smaller data sets > even built-in Java collections with boxed types do surprisingly well > (due to HotSpot optimizations too). > > 5) There are subtle differences in how HPPC is written -- I use pretty > much normal generic classes with some pseudo-intrinsics and > regexp-substituted comments. Sebastiano uses C++ preprocessor to > generate Java classes from templates (yes, wicked). > > I look at Lucene and SOLR source code and learn a LOT from folks > contributing to this project, so HPPC will be hardly any faster or > better compared to what Lucene already has, but if anybody find > anything from HPPC useful, please take handfuls. I would love for this > project to be finally merged with Mahout, but I intentially left it in > Carrot Search labs for a little while so that the API can stabilize > (through our in-house experiments mostly). > > Thanks for showing your interest! > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --00163630fe27f6f3470484bab9a8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Dawid:

=A0=A0=A0=A0 Any performance comparisons with fastutil?
Thanks

-John

On Mon, Apr 19, = 2010 at 1:11 PM, Dawid Weiss <dawid.weiss@gmail.com> wrote:
> Hmmm.. can anybody compare these to fastutil?

I believe I can answer some of your questions.

1) HPPC is not directly Java Collections-compatible. It does have
interface hierarchy, but it's not a descendant of the familiar Set,
Map or List. Fastutil is collections-compatible.

2) HPPC has open internals, so you can do anything you like once your
collections are created, including manipulation of internal storage
arrays, for instance. This was a design decision and goal. As with any
sharp objects, improper use may cause harm.

3) HPPC uses assert instead of fixed condition checks. There are no
attempts to detect misuse (fail-fast iterators, etc.).

4) fastutil is more mature, has support for more data structures
(sorted trees, etc.) and was written by an excellent programmer
(Sebastiano Vigna). HPPC was created internally for use at Carrot
Search and was primarily motivated by speed; we believed that in
certain applications direct access to collections' internals should be<= br> allowed and should be beneficial. Our micro-benchmarks show that this
is largerly true if you manipulate LOTS of data. For smaller data sets
even built-in Java collections with boxed types do surprisingly well
(due to HotSpot optimizations too).

5) There are subtle differences in how HPPC is written -- I use pretty
much normal generic classes with some pseudo-intrinsics and
regexp-substituted comments. Sebastiano uses C++ preprocessor to
generate Java classes from templates (yes, wicked).

I look at Lucene and SOLR source code and learn a LOT from folks
contributing to this project, so HPPC will be hardly any faster or
better compared to what Lucene already has, but if anybody find
anything from HPPC useful, please take handfuls. I would love for this
project to be finally merged with Mahout, but I intentially left it in
Carrot Search labs for a little while so that the API can stabilize
(through our in-house experiments mostly).

Thanks for showing your interest!
Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


--00163630fe27f6f3470484bab9a8--