hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasilis Liaskovitis <vlias...@gmail.com>
Subject Re: swapping on hadoop
Date Fri, 02 Apr 2010 00:04:46 GMT
Hi,

On Thu, Apr 1, 2010 at 2:02 PM, Scott Carey <scott@richrelevance.com> wrote:
>> In this example, what hadoop config parameters do the above 2 buffers
>> refer to? io.sort.mb=250, but which parameter does the "map side join"
>> 100MB refer to? Are you referring to the split size of the input data
>> handled by a single map task?
>
> "Map side join" in just an example of one of many possible use cases where a particular
map implementation may hold on to some semi-permanent data for the whole task.
> It could be anything that takes 100MB of heap and holds the data across individual calls
to map().
>

ok. Now, considering a map side space buffer and a sort buffer, do
both account for tenured space for both map and reduce JVMs? I 'd
think the map side buffer gets used and tenured for map tasks and the
sort space gets used and tenured for the reduce task during sort/merge
phase. Would both spaces really be used in both kinds of tasks?

>
> Java typically uses 5MB to 60MB for classloader data (statics, classes) and some space
for threads, etc.  The default thread stack on most OS's is about 1MB, and the number of
threads for a task process is on the order of a dozen.
> Getting 2-3x the space in a java process outside the heap would require either a huge
thread count, a large native library loaded, or perhaps a non-java hadoop job using pipes.
> It would be rather obvious in 'top' if you sort by memory (shift-M on linux), or vmstat,
etc.  To get the current size of the heap of a process, you can use jstat or 'kill -3' to
create a stack dump and heap summary.
>
thanks, good to know.

>>
>> With this new setup, I don't normally get swapping for a single job
>> e.g. terasort or hive job. However, the problem in general is
>> exacerbated if one spawns multiple indepenendent hadoop jobs
>> simultaneously. I 've noticed that JVMs are not re-used across jobs,
>> in an earlier post:
>> http://www.mail-archive.com/common-dev@hadoop.apache.org/msg01174.html
>> This implies that Java memory usage would blow up when submitting
>> multiple independent jobs. So this multiple job scenario sounds more
>> susceptible to swapping
>>
> The maximum number of map and reduce tasks per node applies no matter how many jobs are
running.

RIght. But depending on your job scheduler, isn't it possible that you
may be swapping the different jobs' JVM space in and out of physical
memory while scheduling all the parallel jobs? Especially if nodes
don't have huge amounts of memory, this scenario sounds likely.

>
>> A relevant question is: in production environments, do people run jobs
>> in parallel? Or is it that the majority of jobs is a serial pipeline /
>> cascade of jobs being run back to back?
>>
> Jobs are absolutely run in parallel.  I recommend using the fair scheduler with no config
parameters other than 'assignmultiple = true' as the 'baseline' scheduler, and adjust from
there accordingly.  The Capacity Scheduler has more tuning knobs for dealing with memory
constraints if jobs have drastically different memory needs.  The out-of-the-box FIFO scheduler
tends to have a hard time keeping the cluster utilization high when there are multiple jobs
to run.

thanks, I 'll try this.

Back to a single job running and assuming all heap space being used,
what percentage of a node's memory would you leave for other functions
esp. disk cache? I currently only have 25% of memory (~4GB) for
non-heapJVM data; I guess there should be a sweet-spot, probably
dependent on the job I/O characteristics.

- Vasilis

Mime
View raw message