mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Schlaikjer <>
Subject Re: Error: Java heap space on mahout cvb command
Date Fri, 25 May 2012 23:33:44 GMT
On Fri, May 25, 2012 at 4:25 PM, DAN HELM <> wrote:

> Out of curiosity, if one were to cluster 1 million documents, what would
> be a reasonable k?  I guess it depends to the nature of the data (domain)
> and application but it would seem if k is too small then the clusters would
> be way too fat and noisy.

Sounds like a reasonable hypothesis. Generally, k depends completely on
your application. You'll have to tune based on what you want to accomplish
with the trained model.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message