mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Filimon <>
Subject Vectorizing 20 newsgroups
Date Thu, 27 Dec 2012 19:53:09 GMT

I'm finally getting back to work on Streaming KMeans! :)
The last thing I did was experiment with different ways of vectorizing
the 20 newsgroups data set and I wanted to project them in 3D and
check out  what I get.

The result is pretty odd, but I get it regardless of the method I use
to generate vectors.
It looks like someone splashed a 2D normal distribution on a sphere.

Here's an image from Ted's algorithm [2] and one from mine [3] using
log term-frequency scoring.
Ted's uses vectors of size 9000 with hashing (using
StaticWordValueEncoder) while mine uses vectors of size ~90000 with a
manual approach.

I think the vectorization actually went okay for both algorithms, but
maybe the projection is off?

The shape is odd. What am I doing wrong? :/


View raw message