mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Forehand <>
Subject Re: override mapreduce compression?
Date Wed, 07 Mar 2012 00:28:45 GMT
I want the results of the kmeans clustering to be uncompressed or
compressed in a way that my users can natively decompress on their
machines.  All our other hadoop jobs use Snappy compression when writing
output, but our users don't have Snappy and don't particularly want to
install it (especially because of problems installing on mac).  I'll try
adding this param to the HADOOP_OPTS and in the longterm probably come up
with a cleaner way to do this.  Thanks!


On 3/6/12 6:24 PM, "Sean Owen" <> wrote:

>-D arguments are to the JVM so need to be set in HADOOP_OPTS (as I
>recall). Or you configure this in your Hadoop config files.  It has no
>meaning to the driver script. Why do you want to disable compression
>after the mapper?
>On Wed, Mar 7, 2012 at 12:11 AM, Luke Forehand
><> wrote:
>> I tried the following and it does not work:
>> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
>> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01
>> -x 100 \
>> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
>> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01
>> -x 100 \
>> And still getting the default codec being used (which is Snappy in this
>> case and I don't want the users to have to install native snappy which
>> why I'm trying to override this param).  Passing -Dkey=value on the
>> command line does not seem to have any effect on the mapreduce job
>> configuration from what I can tell.  Any ideas?
>> -Luke
>> On 3/6/12 3:48 PM, "Sean Owen" <> wrote:
>>>Mapper compression? I think the
>>>key was mapred.output.compress in Hadoop 0.20.0.
>>>I am not sure if there is reducer compression built-in, but, I could
>>>have missed it.
>>>On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand
>>><> wrote:
>>>> Hello,
>>>> Is there a way to run the mahout kmeans program from the command line,
>>>>with a parameter that will override (and disable) the reducer task
>>>>compression?  I have tried several different ways of specifying -D
>>>>parameter but I can't seem to get any options to pass through to the
>>>>hadoop mapreduce configuration.
>>>> Thanks!
>>>> Luke

View raw message