hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sset <satish.se...@hcl.com>
Subject increase number of map tasks
Date Mon, 09 Jan 2012 16:46:50 GMT

Hello,

In hdfs we have set block size - 40bytes . Input Data set is as below
terminated with line feed.

data1   (5*8=40 bytes)
data2
......
.......
data10
 
 
But still we see only 2 map tasks spawned, should have been atleast 10 map
tasks. Each mapper performs complex mathematical computation. Not sure how
works internally. Line feed does not work. Even with below settings map
tasks never goes beyound 2, any way to make this spawn 10 tasks. Basically
it should look like compute grid - computation in parallel.
 
<property>
  <name>io.bytes.per.checksum</name>
  <value>30</value>
  <description>The number of bytes per checksum.  Must not be larger than
  io.file.buffer.size.</description>
</property>

<property>
  <name>dfs.block.size</name>
   <value>30</value>
  <description>The default block size for new files.</description>
</property>

<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>10</value>
  <description>The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

single node with high configuration -> 8 cpus and 8gb memory. Hence taking
an example of 10 data items with line feeds. We want to utilize full power
of machine - hence want at least 10 map tasks - each task needs to perform
highly complex mathematical simulation.  At present it looks like file data
is the only way to specify number of map tasks via splitsize (in bytes) -
but I prefer some criteria like line feed or whatever.

How do we get 10 map tasks from above configuration - pls help.

thanks
 
-- 
View this message in context: http://old.nabble.com/increase-number-of-map-tasks-tp33107775p33107775.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message