mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pat Ferrel (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable
Date Mon, 27 Jan 2014 19:48:42 GMT


Pat Ferrel commented on MAHOUT-1030:

The distance should be measured the same way that the cluster is created, right? Otherwise
how would you know what metric is used?

Also the data is from boolean recommender data (all dimensions = 0,1) it is the item-item
similarity matrix calculated using loglikelihood and looks like the stuff below. Andrew and
Suneel have copies of it.

Key: 0: Value: {5:0.854718643737767,172:0.8292121703264371,192:0.854718643737767,13:0.8962379566075429,198:0.8962379566075429,20:0.8962379566075429,9:0.8962379566075429,19:0.8962379566075429,201:0.8962379566075429,207:0.8962379566075429,43:0.6976395601234899,193:0.8962379566075429,18:0.8962379566075429,184:0.8962379566075429,209:0.8962379566075429,187:0.854718643737767,2:0.8962379566075429,211:0.854718643737767,27:0.7327087555023397,177:0.854718643737767,183:0.8292121703264371,214:0.8962379566075429,17:0.854718643737767,190:0.854718643737767,176:0.8292121703264371,12:0.8962379566075429,191:0.854718643737767}
Key: 2: Value: {5:0.8962379566075429,12:0.919419979968322,19:0.919419979968322,9:0.919419979968322,17:0.8962379566075429,20:0.919419979968322,18:0.919419979968322,13:0.919419979968322}

> Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable
> ------------------------------------------------------------------------------------------------
>                 Key: MAHOUT-1030
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering, Integration
>    Affects Versions: 0.7
>            Reporter: Jeff Eastman
>            Assignee: Andrew Musselman
>             Fix For: 0.9
>         Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch,
MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
> Looks like this won't make it into this build. Pretty widespread impact on code and tests
and I don't know which properties were implemented in the old version. I will create a JIRA
and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new ClusterClassificationDriver
was introduced. It should be a pretty easy fix and I will see if I can make the change before
Paritosh cuts the release bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as WeightedVectorWritable
where in mahout 0.6 they were WeightedPropertyVectorWritable? This means that the distance
from the centroid is no longer stored here? Why? I hope I'm wrong because that is not a welcome
change. How is one to order clustered docs by distance from cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the centroid
for the cluster id given in the above WeightedVectorWritable, which means iterating through
all the clusters for each clustered doc. In my case the number of clusters could be fairly
> >>
> >> Am I missing something?
> >>
> >>
> >

This message was sent by Atlassian JIRA

View raw message