mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: Retrieving labels for indexes?
Date Tue, 08 Dec 2009 05:48:27 GMT
This brings up a point about our linear primitives: are 32bit integers big
enough for our index range for vectors and matrices?  Especially for
having billions of rows is completely possible, even if it is on the large

If we want to be about "scalable" machine learning, we really don't want to
seal ourselves in to "only" 2 billion x 2 billion matrices in the long run,
do we?

How hard would it be to promote our ints to longs?


On Sat, Dec 5, 2009 at 4:48 AM, Sean Owen <> wrote:

> I'm trying to use Vectors to represent a vector of user preferences.
> All is well since items are numeric and can be used as indexes into a
> Vector -- almost. I have longs, and of course indexes are ints.
> I could fold the long IDs into ints without too much worry about the
> effects of collision. However I still need to remember the original
> item IDs for each index. I could do it with labels, but I can't
> retrieve the label for an index (and the other mapping isn't
> serialized anyway?).
> So I guess I must separately store this mapping? Just making sure I'm
> not missing something.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message