mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Performance of primitive collections
Date Thu, 06 Jun 2013 01:13:47 GMT
Dawid, could you explain how this is accessing the mahout trunk code? Is
that part of the package step?

Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Wed, Jun 5, 2013 at 8:20 PM, Sean Owen <srowen@gmail.com> wrote:

> On Wed, Jun 5, 2013 at 7:00 PM, Dawid Weiss
> <dawid.weiss@cs.put.poznan.pl> wrote:
> > 1) if you know your keys are 0...N-1 (dense) then there is typically no
> > need to use a map; you'll do just as well with a regular lookup array :)
>
> Yeah of course... here I think there is no guarantee of this although
> in many practical use cases the keys are drawn from a small range. For
> example 1000 user IDs are often around the range 1 - 1000 or
> something.
>
>
> > Having written the above it made me wonder why Mahout's performance does
> > not drop significantly on those malicious upper-bit changing benchmarks,
> so
> > I checked the code. The thing is Mahout (and I think Colt, originally)
> uses
> > double hashing with prime-sized tables (not 2^n-1 masking) -- this is
> nice
> > and indeed avoids the problems of clustering mentioned in pt. 2 above.
>
> Yes I thought this was why you could largely get away with it. If the
> table is big enough, collisions and chains should not be a big deal to
> begin with.
>
> Since you have this benchmark readily available -- what happens if you
> un-comment one or both of the lines of code that mixes the hash? That
> would really answer it.
>
> i'm curious as I have certainly made this assumption independently in
> the custom hashtable implementations I made long ago (which also use
> double hashing) and still use in the project. If it's not optimal
> that's important info.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message