mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Mathew <mcclou...@gmail.com>
Subject Re: How to use clusterpp?
Date Fri, 17 Feb 2012 13:09:51 GMT
Or I can just use the cluster dump tool right...?

On Fri, Feb 17, 2012 at 5:55 PM, Paritosh Ranjan <pranjan@xebia.com> wrote:

> Try logging in and updating.
>
> Thanks...

>
> On 17-02-2012 17:54, Tharindu Mathew wrote:
>
>> OffTopic: How would I contribute a documentation patch?
>>
>> On Fri, Feb 17, 2012 at 3:11 PM, gaurav redkar<gauravredkar@gmail.com>**
>> wrote:
>>
>>  If that is the only thing that is contained in the part-r-* file, then
>>> the
>>> reducer responsible to write to that part-r-* file did not recieve any
>>> input records to write to it. This happens because the program uses the
>>> default hash partitioner which sometimes maps records belonging to
>>> different clusters to a same reducer; thus leaving some reducers without
>>> any input records.
>>>
>>> the simplest and the quickest way to view the contents of the part-r-*
>>> files will be to change the outputformat of the job from
>>> SequenceFileOutputFormat to TextOutputFormat and comment the line where
>>> the
>>> program calls the "**movePartFilesToRespectiveDirec**tories()" function
>>> since
>>> this function expects the part-r-* files to be in sequencefile format.
>>> This
>>> way you will get all the part files in human-readable format.
>>>
>>> You can later even modify the "**movePartFilesToRespectiveDirec**
>>> tories()"
>>> function to move the part-r* files to respective directories.
>>>
>>> Hope this helps.
>>>
>>>
>>>
>>> On Fri, Feb 17, 2012 at 2:36 PM, Paritosh Ranjan<pranjan@xebia.com>
>>> wrote:
>>>
>>>  Check this out https://cwiki.apache.org/****
>>>> MAHOUT/top-down-clustering.**<https://cwiki.apache.org/**MAHOUT/top-down-clustering.**>
>>>> html<https://cwiki.apache.org/**MAHOUT/top-down-clustering.**html<https://cwiki.apache.org/MAHOUT/top-down-clustering.html>
>>>> >.
>>>>
>>>> It tells how to use clusterpp.
>>>>
>>>> You will not get a human readable version.
>>>> The output will be in SequenceFileFormat, which is not human readable.
>>>> SequeneFileFormat is a key value format. You will have to iterate over
>>>> it
>>>> and read the key value and print into a text file or console.
>>>>
>>>> Look into this package org.apache.mahout.common.****
>>>> iterator.sequencefile.
>>>> This package contains some utility classes which can help you iterate
>>>> through SequenceFileFormat files.
>>>>
>>>>
>>>> On 17-02-2012 14:18, Tharindu Mathew wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> I'm trying to reproduce https://issues.apache.org/**
>>>>> jira/browse/MAHOUT-966<
>>>>>
>>>> https://issues.apache.org/**jira/browse/MAHOUT-966<https://issues.apache.org/jira/browse/MAHOUT-966>
>>> >
>>>
>>>> When executing clusterpp, I get out put such as this:
>>>>>
>>>>> $bin/hadoop fs -cat /user/mackie/output/****ppclusters/part-r-00999
>>>>> SEQorg.apache.hadoop.io.Text%****org.apache.mahout.math.**
>>>>> VectorWritable_䪖?g???8?-??
>>>>>
>>>>> Is this normal? I thought I would get some human readable output when
>>>>>
>>>> this
>>>
>>>> was used... I tried searching around but couldn't get any documentation
>>>>> regarding clusterpp
>>>>>
>>>>>
>>>>>
>>
>>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message