mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: syntheticcontroldata clustering example failure due to combiner
Date Thu, 11 Jun 2009 19:54:19 GMT
For L_2 centroids, you just have to have the mapper emit a trivial sum and a
count (of 1).  The combiner should take a list of vector sums and counts and
produce a combined sum and count.

Then the reducer will get a sums and counts and it should add them together
and divide by the count.

(just like n-dimensional word count!)

On Thu, Jun 11, 2009 at 9:49 AM, Adil Aijaz <adil@yahoo-inc.com> wrote:

> Jeff,
>
> Thanks for the quick turnaround on this issue. Just tested it and the
> canopy creation and kmeans both work now on syntheticcontroldata. I get 7
> canopies and 7 clusters. Collection logic in close() is not pretty but can't
> think of a workaround myself.
>
> adil
>
>
> Jeff Eastman wrote:
>
>> r783617 removed the CanopyCombiner and refactored its semantics back into
>> the reducer. Updated unit tests pass and Synthetic Control with Canopy
>> produces 6 clusters. Kmeans also runs produces 6 clusters too. I really
>> don't like doing stuff in close() but see no practical alternative. Ideas
>> are still welcomed.
>>
>> Jeff
>>
>>
>> Jeff Eastman wrote:
>>
>>> Adil Aijaz wrote:
>>>
>>>> 2. There is a bug in
>>>> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
>>>> that called runJob from main function with my provided arguments transposed.
>>>> So, my convergenceDelta was interpreted as t1, t1 as t2, and t2 as
>>>> convergenceDelta. I will commit a patch as soon as I get approval for
>>>> opensource commits from my employer, however, I thought I'd put it out there
>>>> in case someone else is going through the same issue.
>>>>
>>>>  r783585 fixed the parameter ordering bug. Still working on the Combiner
>>> problem.
>>>
>>> Thanks Adil,
>>> Jeff
>>>
>>>
>>>
>>
>


-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message