mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Stopwords work for Solr but not for Mahout
Date Sat, 02 Jan 2010 22:03:44 GMT

On Jan 2, 2010, at 1:27 PM, Bogdan Vatkov wrote:

> Thanks for the Luke hint, I will try it out but now I noticed something else
> which is very very strange - I ran k-means on 23K+ docs and with 50 clusters
> which all seem to be very very strange as top term collection - I would say
> for 90% of the top terms I get some words which I barely recognize.
> I did a short check and for one particular term, which anyway sounded
> strange and which appeared in top terms for 9 of the 50 clusters, I found
> that it has "doc freq" = 2 in the Solr dictionary.
> How is this even possible - for 23, 000 docs and for a term which is
> mentioned only 2 times I have it as a top term in 9 clusters? I definitely
> did something wrong, do you have an idea what that could be?

What commands are you running?

Can you share more about your setup or try to reproduce in a much smaller environment?

View raw message