hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: Hadoop datanode crashed - SIGBUS
Date Mon, 01 Dec 2008 21:40:38 GMT
I'd run memcheck overnight on the nodes that caused the problem, just  
to be sure.

Another (unlikely) possibility is that the JNI callouts for the native  
libraries Hadoop use (for the Compression codecs, I believe) have  
crashed or were set up wrong, and died fatally enough to take out the  
JVM.  Are you using any compression?  Does your job complete  
successfully in "local" mode, if the crash correlates well with a job  
running?

Brian

On Dec 1, 2008, at 3:32 PM, Sagar Naik wrote:

>
>
> Brian Bockelman wrote:
>> Hardware/memory problems?
> I m not sure.
>>
>> SIGBUS is relatively rare; it sometimes indicates a hardware error  
>> in the memory system, depending on your arch.
>>
> *uname -a : *
> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST  
> 2006 i686 i686 i386 GNU/Linux
> *top's top*
> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,   
> 0.0% si
> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k  
> buffers
> Swap: 16386292k total,       68k used, 16386224k free,   522408k  
> cached
>
> 8 core , xeon  2GHz
>
>> Brian
>>
>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>
>>> Couple of the datanodes crashed with the following error
>>> The /tmp is 15% occupied
>>>
>>> #
>>> # An unexpected error has been detected by Java Runtime Environment:
>>> #
>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>> #
>>> [Too many errors, abort]
>>>
>>> Pl suggest how should I go to debug this particular problem
>>>
>>>
>>> -Sagar
>>
>
> Thanks to Brian
>
> -Sagar


Mime
View raw message