The number of map tasks for a job is a function of the InputFormat,
which in the case of ColumnInputFormat is a function of the global
number of keys in Cassandra. The number of concurrent maps being
executed at any given time per TaskTracker (per node) is set by
mapred.tasktracker.reduce.tasks.maximum.
j
On Fri, May 7, 2010 at 9:57 AM, Joseph Stein <cryptcom@gmail.com> wrote:
> you can manage the number of map tasks by node
>
> mapred.tasktracker.map.tasks.maximum=1
>
>
> On Fri, May 7, 2010 at 9:53 AM, gabriele renzi <rff.rff@gmail.com> wrote:
>> On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> Sounds like you need to configure Hadoop to not create a whole bunch
>>> of Map tasks at once
>>
>> interesting, from a quick check it seems there are a dozen threads running.
>> Yet , setNumMapTasks seems to be deprecated (together with JobConf)
>> and while I guess
>> -Dmapred.map.tasks=N
>> may still work, it seems that so it seems the only way to manage the
>> number of map tasks is via a custom subclass of
>> ColumnFamilyInputFormat.
>>
>> But of course you have a point that in a single box this does not add anything.
>>
>
>
>
> --
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> */
>
|