incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joost Ouwerkerk <jo...@openplaces.org>
Subject Re: timeout while running simple hadoop job
Date Fri, 07 May 2010 14:08:14 GMT
The number of map tasks for a job is a function of the InputFormat,
which in the case of ColumnInputFormat is a function of the global
number of keys in Cassandra.  The number of concurrent maps being
executed at any given time per TaskTracker (per node) is set by
mapred.tasktracker.reduce.tasks.maximum.
j

On Fri, May 7, 2010 at 9:57 AM, Joseph Stein <cryptcom@gmail.com> wrote:
> you can manage the number of map tasks by node
>
> mapred.tasktracker.map.tasks.maximum=1
>
>
> On Fri, May 7, 2010 at 9:53 AM, gabriele renzi <rff.rff@gmail.com> wrote:
>> On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> Sounds like you need to configure Hadoop to not create a whole bunch
>>> of Map tasks at once
>>
>> interesting, from a  quick check it seems there are a dozen threads running.
>> Yet , setNumMapTasks seems to be deprecated (together with JobConf)
>> and while I guess
>>   -Dmapred.map.tasks=N
>> may still work, it seems that  so it seems the only way to manage the
>> number of map tasks is via a custom subclass of
>> ColumnFamilyInputFormat.
>>
>> But of course you have a point that in a single box this does not add anything.
>>
>
>
>
> --
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> */
>

Mime
View raw message