On Jul 29, 2009, at 9:07 AM, nfantone wrote:
> Grant, I took a look at your patch. It seems as though you did
> something similar to what I did. However, I believe that there's still
> room for improvement as there are things being calculated
> unnecessarily for no apparent reason. Could you please read my
> previous post? At least the "excursus" bit. I may be totally wrong,
> though: some particular parts were a bit obscure to me. Perhaps you
> (or Shashikant) can throw some light in there? We might be able to
> release a bigger/better patch.
Agreed, can you put your changes up as a patch on MAHOUT-121? That
way we can do file diffs, etc.
>
>>> I think your data set ran, for 10 iterations, in just over 2
>>> minutes
>>> and that was with the profiler hooked up, too.
>
> Um... I also did that and, while it was considerably faster than
> before, it took about ~2hs to complete (it used to take days, mind
> you), using a 4 node hadoop cluster. The actual vector clustering
> only, that is the final step, took just over an hour:
>
> Started at: Tue Jul 28 17:44:20 ART 2009
> Finished at: Tue Jul 28 18:46:24 ART 2009
> Finished in: 1hrs, 2mins, 4sec
>
> How exactly did you launch the job? What convergence delta did you
> choose? Hoy many clusters did you set up initially?
--input ../nfantone/user.data --clusters ../nfantone/output/clusters --
k 10 --output ../content/nfantone/output/ --convergence 0.01 --overwrite
So, it wasn't exactly what you were running. I will try to run your's
at some point.
-Grant
|