mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Hash-coded Vectorization and bogus information
Date Sun, 12 Feb 2012 15:00:39 GMT
Hash coded vectorization *is* a random projection.  It is just one that
preserves some degree of sparsity.  It definitely loses information when
you use it to decrease dimension of the input.  It does not "add bogus
information".

SGD doesn't like dense vectors, actually.  In fact, one of the nice
properties of SGD is that it does sparse updates well.

On Sat, Feb 11, 2012 at 9:37 PM, Lance Norskog <goksron@gmail.com> wrote:

> Does hash-coded vectorization add bogus information compared to sparse
> term vectors? A more concrete question: would a random projection on
> the sparse vector give a "better quality" dense vector? (This is in
> the context of SGD classification, which "likes" dense vectors.)
>
> --
> Lance Norskog
> goksron@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message