> That does seem like a long time. > > Is your data sparse or dense? I would say sparse. My vectors are high dimensional and most of their values are zero. > Perhaps a larger convergence value might help (-d, I believe). I'll try that. > Is there any chance your data is publicly shareable?  Come to think of it, > with the vector representations, as long as you don't publish the key (which > terms map to which index), I would think most all data is publicly > shareable. I'm sorry, I don't quite understand what you're asking. Publicly shareable? As in user-permissions to access/read/write the data? > Are you on trunk of Mahout?  I think we still need more profiling to get a > better idea of where improvements can be made. I am. Updated this morning. I still insist on the configuration issue, and have never considered Mahout's algorithms implementation to be the actual cause of poor performance. For now, I've been running kMeans exclusively. Perhaps, I should try with different clustering methods and see if it takes a similar amount of time to complete.