hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Forrest Aldrich <for...@gmail.com>
Subject Re: Resource limits with Hadoop and JVM
Date Sat, 28 Sep 2013 04:48:49 GMT
I wanted to elaborate on what happened.

A hadoop slave was added to a live cluster.   Turns out, I think the 
mapred-site.xml was not configured with the correct master host.  But 
alas, in any case when the commands were run:


  * |$ hadoop mradmin -refreshNodes|
  * |$ hadoop dfsadmin -refreshNodes|

||

The master went completely berserk, up to a system load of 60 where it 
froze.

This should never, ever happen -- no matter what the issue.   So what 
I'm trying to understand is how to prevent this while allowing 
hadoop/java to run about its business.

We are using an older version of Hadoop (1.0.1) so maybe we hit a bug, I 
can't really tell.

I read an article about Spotify experiencing issues like this and some 
of their approaches, but it's not clear which is which here (I'm a newbie).


Thanks.



On 9/16/13 5:04 PM, Vinod Kumar Vavilapalli wrote:
> I assume you are on Linux. Also assuming that your tasks are so 
> resource intensive that they are taking down nodes. You should enable 
> limits per task, see 
> http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring
>
> What it does is that jobs are now forced to up front provide their 
> resource requirements, and TTs enforce those limits.
>
> HTH
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:
>
>> We recently experienced a couple of situations that brought one or 
>> more Hadoop nodes down (unresponsive).   One was related to a bug in 
>> a utility we use (ffmpeg) that was resolved by compiling a new 
>> version. The next, today, occurred after attempting to join a new 
>> node to the cluster.
>>
>> A basic start of the (local) tasktracker and datanode did not work -- 
>> so based on reference, I issued: hadoop mradmin -refreshNodes, which 
>> was to be followed by hadoop dfsadmin -refreshNodes.    The load 
>> average literally jumped to 60 and the master (which also runs a 
>> slave) became unresponsive.
>>
>> Seems to me that this should never happen.   But, looking around, I 
>> saw an article from Spotify which mentioned the need to set certain 
>> resource limits on the JVM as well as in the system itself 
>> (limits.conf, we run RHEL).    I (and we) are fairly new to Hadoop, 
>> so some of these issues are very new.
>>
>> I wonder if some of the experts here might be able to comment on this 
>> issue - perhaps point out settings and other measures we can take to 
>> prevent this sort of incident in the future.
>>
>> Our setup is not complicated.   Have 3 hadoop nodes, the first is 
>> also a master and a slave (has more resources, too).   The underlying 
>> system we do is split up tasks to ffmpeg  (which is another issue as 
>> it tends to eat resources, but so far with a recompile, we are 
>> good).   We have two more hardware nodes to add shortly.
>>
>>
>> Thanks!
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. 
> Thank You. 


Mime
View raw message