hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "charles du" <taiping...@gmail.com>
Subject map tasks and processes
Date Tue, 12 Aug 2008 18:21:40 GMT

Does hadoop always start a new process for each map task?

I have a 20s-machine cluster and configured each task tracker to run 2
concurrent tasks at most. So the cluster can run 40 task in parallel. If I
start a hadoop job with 1000 tasks, will hadoop  create 1000 map processes
during the execution of the job, or it will start 40 processes at the
beginning, and process 1000 tasks one  by one (of course, at any particular
time, only 40 running)?

My map tasks have a long initialization time before they start processing
data files. So it will be ideal if map processes could be reused among
different tasks, instead of creating a new process for each of them. Is
there a way to do it?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message