mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Mahout clusterdump
Date Wed, 23 May 2012 13:26:13 GMT
Cluster 'names' are just numbers and these are allocated automatically. 
NamedVectors can be used to assign meaningful identifiers (e.g. document 
names) to input vectors but not to clusters. As clusters are produced 
dynamically, I'm not sure how one could reliably assign meaningful 
identifiers to them apriori. Wouldn't the name be derived by inspection 
of the cluster centers after iterations complete?

Since you are clustering documents, perhaps you could post-process your 
clusters using your dictionary to assign names based upon the largest n 
terms in each cluster center vector. A 'NamedCluster' wrapper could then 
preserve this information into subsequent clustering steps.

On 5/23/12 3:07 AM, Paritosh Ranjan wrote:
> As far as I know, cluster names can not be changed.
> However, there is a NamedVector which can be used to name vectors. Not 
> sure whether it can be used in this case.
>
> On 23-05-2012 02:57, Bahadır Yılmaz wrote:
>> Hi everyone,
>> for cluster dumping i am using that code:
>> -------------------------------------------------------------------------------------------------------------

>>
>> bin/mahout clusterdump \
>>    -d haberdata-vectorsall/dictionary.file-0 \
>>    -dt sequencefile \
>>    -s haberdata-kmeans-clustersall/clusters-10-final/part-r-00000 \
>>    -n 20 \
>>    -b 100 \
>>    -p haberdata-kmeans-clustersall/clusteredPoints/\
>>    -o outputall/outFromIdea_kmeansWithEuclidean.txt\
>>    -of TEXT
>> -------------------------------------------------------------------------------------------------------------

>>
>> this code gives an output like this :
>> ----------------------------------------------------------------------------------------------------------------

>>
>> :VL-3104{n=6 c=[188:0.013, 196:0.012, 293:0.012, 513:0.014, 
>> 564:0.013, 606:0.014, 670:0.014, 870:0.01
>>
>>     Weight : [props - optional]:  Point:
>>     1.0 : [distance=0.8604413702422293]: 1277 = [188:0.075, 
>> 196:0.070, .............
>>     1.0 : [distance=0.7554798862508255]: 3109 = [aları:0.067, 
>> alexei:0.075...........
>>
>> ----------------------------------------------------------------------------------------------------------------

>>
>>
>> i want to ask a question.Can i rename my cluster names.For example in 
>> this example my cluster name is VL-3104 , and i want to rename this 
>> name with words that represent my cluster the best.
>>
>> Sorry for my English.
>> Thanks.
>
>
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message