mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: syntheticcontroldata clustering example failure due to combiner
Date Thu, 11 Jun 2009 17:06:45 GMT
So what are you guys doing to get from an unpredictable number of
canopies to a 'k' value for k-means and an initial assignment of each
item to one cluster?


On Thu, Jun 11, 2009 at 12:49 PM, Adil Aijaz<adil@yahoo-inc.com> wrote:
> Jeff,
>
> Thanks for the quick turnaround on this issue. Just tested it and the canopy
> creation and kmeans both work now on syntheticcontroldata. I get 7 canopies
> and 7 clusters. Collection logic in close() is not pretty but can't think of
> a workaround myself.
>
> adil
>
> Jeff Eastman wrote:
>>
>> r783617 removed the CanopyCombiner and refactored its semantics back into
>> the reducer. Updated unit tests pass and Synthetic Control with Canopy
>> produces 6 clusters. Kmeans also runs produces 6 clusters too. I really
>> don't like doing stuff in close() but see no practical alternative. Ideas
>> are still welcomed.
>>
>> Jeff
>>
>>
>> Jeff Eastman wrote:
>>>
>>> Adil Aijaz wrote:
>>>>
>>>> 2. There is a bug in
>>>> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
>>>> that called runJob from main function with my provided arguments transposed.
>>>> So, my convergenceDelta was interpreted as t1, t1 as t2, and t2 as
>>>> convergenceDelta. I will commit a patch as soon as I get approval for
>>>> opensource commits from my employer, however, I thought I'd put it out there
>>>> in case someone else is going through the same issue.
>>>>
>>> r783585 fixed the parameter ordering bug. Still working on the Combiner
>>> problem.
>>>
>>> Thanks Adil,
>>> Jeff
>>>
>>>
>>
>
>

Mime
View raw message