mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Royi Ronen <ronen.r...@gmail.com>
Subject tfidf vectors are generated without data
Date Sun, 21 Aug 2011 20:30:36 GMT
Hi everybody,

I am trying to run k-means clustering on my own data.

I modified NewsKMeansExample from the Mahout book, to read some of my
documents.

I can see that the follwing have been created correctly:

tokenized-documets/part-m-00000
df-count/part-r-00000
tf-vectors/part-r-00000

The numbers are in perfect match with the input.
The directory and frequencies files are also ok.

However, the tfidf-vectors seem to have an empty vector for each document.
Reading them gives (e.g., for document id2):

id2 = >
{"class":"org.apache.mahout.math.SequentialAccessSparseVector","vector":"{\"values\":{\"indices\":[],\"values\":[],\"numMappings\":0},\"size\":4968,\"lengthSquared\":-1.0}"}

Clustering results in the following:

0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
.....

Any help regarding how to get meaningful tf-idf vectors will be much
appreciated :)

Thanks!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message