Hash coded vectorization *is* a random projection. It is just one that
preserves some degree of sparsity. It definitely loses information when
you use it to decrease dimension of the input. It does not "add bogus
information".
SGD doesn't like dense vectors, actually. In fact, one of the nice
properties of SGD is that it does sparse updates well.
On Sat, Feb 11, 2012 at 9:37 PM, Lance Norskog wrote:
> Does hash-coded vectorization add bogus information compared to sparse
> term vectors? A more concrete question: would a random projection on
> the sparse vector give a "better quality" dense vector? (This is in
> the context of SGD classification, which "likes" dense vectors.)
>
> --
> Lance Norskog
> goksron@gmail.com
>