mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <>
Subject Re: Optimization opportunity: Speed up serialization and deserialization
Date Sun, 02 May 2010 16:49:21 GMT
PS, The size of the SparseVector is greater than the dense vector for a full
vector. I guess something could be done about it.

On Sun, May 2, 2010 at 10:03 PM, Sean Owen <> wrote:

> That's the one! I actually didn't know this was how PBs did the
> variable length encoding but makes sense, it's about the most
> efficient thing I can imagine.
> Values up to 16,383 fit in two bytes, which less than a 4-byte int and
> the 3 bytes or so it would take the other scheme. Could add up over
> thousands of elements times millions of vectors.
> Decoding isn't too slow and if one believes this isn't an unusual
> encoding, it's not so problematic to use it in a format that others
> outside Mahout may wish to consume.
> On Sun, May 2, 2010 at 5:23 PM, Robin Anil <> wrote:
> > You mean this type of encoding instead?
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message