mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Royi Ronen <>
Subject tfidf vectors are generated without data
Date Sun, 21 Aug 2011 20:30:36 GMT
Hi everybody,

I am trying to run k-means clustering on my own data.

I modified NewsKMeansExample from the Mahout book, to read some of my

I can see that the follwing have been created correctly:


The numbers are in perfect match with the input.
The directory and frequencies files are also ok.

However, the tfidf-vectors seem to have an empty vector for each document.
Reading them gives (e.g., for document id2):

id2 = >

Clustering results in the following:

0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []

Any help regarding how to get meaningful tf-idf vectors will be much
appreciated :)


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message