mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <jeast...@Narus.com>
Subject RE: Fields needed after clustering but not used within Mahout
Date Fri, 15 Jul 2011 17:05:56 GMT
Ok, so if you wrap your vector data in a NamedVector:
NV(<id1>,[1,2,3,4,5,6])
NV(<id2>,[1,2,3,4,5,6])
NV(<id3>,[2,3,3,4,5,7])

And keep another index file:
<id1>, (BOB, Chicago)
<id2>, (PHIL, Miami)
<id3>, (Cindy, NY)

Then what you will get out of clustering will be:
NV(<id1>,[1,2,3,4,5,6])      is Cluster 0
NV(<id2>,[1,2,3,4,5,6])      is Cluster 0
NV(<id3>,[2,3,3,4,5,7])      is Cluster 1

Finally you can join them back together to get:
1,2,3,4,5,6,BOB, Chicago     is Cluster 0
1,2,3,4,5,6,PHIL, Miami      is Cluster 0
2,3,3,4,5,7,Cindy, NY        is Cluster 1

-----Original Message-----
From: dbg [mailto:dvd.gettier@yahoo.com] 
Sent: Friday, July 15, 2011 6:46 AM
To: mahout-dev@lucene.apache.org
Subject: Re: Fields needed after clustering but not used within Mahout

To elaborate further...

The data I am clustering is:
1,2,3,4,5,6,BOB,Chicago
1,2,3,4,5,6,PHIL,Miami
2,3,3,4,5,7,Cindy,NY

The data I vector and send through Mahout/Kmeans is:
1,2,3,4,5,6
1,2,3,4,5,6
2,3,3,4,5,7

That data I get back is:
1,2,3,4,5,6      is Cluster 0
1,2,3,4,5,6      is Cluster 0
2,3,3,4,5,7      is Cluster 1


The data I want is:

1,2,3,4,5,6,BOB, Chicago   is Cluster 0
1,2,3,4,5,6,PHIL, Miami     is Cluster 0
2,3,3,4,5,7,Cindy, NY        is Cluster 1

Thanks

--
View this message in context: http://lucene.472066.n3.nabble.com/Fields-needed-after-clustering-but-not-used-within-Mahout-tp3170297p3171977.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

Mime
View raw message