hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Collins <ch...@scoutlabs.com>
Subject Re: Hadoop datanode crashed - SIGBUS
Date Mon, 01 Dec 2008 22:04:19 GMT
I had some pretty bad issues with leaks in _07.   _10 btw has a lot of  
bug fixes.  I dont know it would fix this problem.  As for flags I  
wouldnt know.  One thing you could try is to try and match the memory  
region that the program counter matches.  If you use jstack or jmap,  
cant remember which, it will give you a dump of all the libraries and  
their memory address range.  From that you may see if the PCounter  
matches anything interesting.

Other than that I would go with Brians recommendations.

C
On Dec 1, 2008, at 1:59 PM, Sagar Naik wrote:

>
> hi,
> I dont have additional information on it. If u know any other flag  
> tht I need to turn on , pl do tell me . The flags tht are currently  
> on  are " -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC - 
> Dcom.sun.management.jmxremote"
> But this is what is listed in stdout (datanode.out) file
>
> Java version :
> java version "1.6.0_07"
> Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
> Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode)
>
>
> I will try to stress test the memory.
>
> -Sagar
>
> Chris Collins wrote:
>> Was there anything mentioned as part of the tombstone message about  
>> "problematic frame"?  What java are you using?  There are a few  
>> reasons for SIGBUS errors, one is illegal address alignment, but  
>> from java thats very unlikely....there were some issues with the  
>> native zip library in older vm's.  As Brian pointed out, sometimes  
>> this points to a hw issue.
>>
>> C
>> On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:
>>
>>>
>>>
>>> Brian Bockelman wrote:
>>>> Hardware/memory problems?
>>> I m not sure.
>>>>
>>>> SIGBUS is relatively rare; it sometimes indicates a hardware  
>>>> error in the memory system, depending on your arch.
>>>>
>>> *uname -a : *
>>> Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST  
>>> 2006 i686 i686 i386 GNU/Linux
>>> *top's top*
>>> Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0%  
>>> hi,  0.0% si
>>> Mem:   8288280k total,  1575680k used,  6712600k free,     5392k  
>>> buffers
>>> Swap: 16386292k total,       68k used, 16386224k free,   522408k  
>>> cached
>>>
>>> 8 core , xeon  2GHz
>>>
>>>> Brian
>>>>
>>>> On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
>>>>
>>>>> Couple of the datanodes crashed with the following error
>>>>> The /tmp is 15% occupied
>>>>>
>>>>> #
>>>>> # An unexpected error has been detected by Java Runtime  
>>>>> Environment:
>>>>> #
>>>>> #  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
>>>>> #
>>>>> [Too many errors, abort]
>>>>>
>>>>> Pl suggest how should I go to debug this particular problem
>>>>>
>>>>>
>>>>> -Sagar
>>>>
>>>
>>> Thanks to Brian
>>>
>>> -Sagar
>>
>


Mime
View raw message