hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: Errors in regionserver logs
Date Wed, 02 Mar 2011 16:55:26 GMT
If HBASE-3038 is the problem is there anything I should be aware of during upgrading while
this region is in this state?

thanks,
-chris

On Mar 2, 2011, at 8:22 AM, Chris Tarnas wrote:

> I'm pretty sure I hit HBASE-3038, the recovered.edits file is over 2GB
> 
> I'll push up my upgrade plans.
> 
> -chris
> 
> On Mar 2, 2011, at 2:44 AM, Chris Tarnas wrote:
> 
>> Actually I see now that this EOFException is keeping a region offline, are there
anyways around this error to bring the region back online? I don't have the logs from the
regionservers when it went offline but here is the section of the master log from then:
>> 
>> http://pastebin.com/4ZBKGbnZ
>> 
>> thanks again
>> -chris
>> 
>> On Mar 2, 2011, at 1:03 AM, Chris Tarnas wrote:
>> 
>>> Under heavy loads I've seen a few of EOFException errors in my regionserver logs:
>>> 
>>> 2011-03-02 02:27:03,669 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
Error opening sequence,h7BpVjo07UDYrkBZBLwWfg\x09fc00fc97be11e00d731605f8e061462c-A2610001-1\x09,1298335975607.8a5d1e4a300792d74f516ba26de869c8.
>>> java.io.EOFException: hdfs://lxbt006-pvt:8020/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364,
entryStart=2336278916, pos=2336278916, end=4672557832, edit=13370
>>> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>> 
>>> Checking the same timeframe in the namenode logs on lcbt006-pvt reveals no ominous
messages (no warns, errors, anything), just the same file being opened by a different node:
>>> 
>>> 2011-03-02 02:27:05,466 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=hadoop      ip=/10.56.24.13 cmd=open        src=/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364
       dst=null        perm=null
>>> 
>>> 
>>> The Troubleshooting Wiki mentions it is related to swapping, but none of the
nodes are swapping - they all have plenty of RAM. Are there other common causes? Is this anything
I should be worried about or just "normal" exceptions, anything else I should look for? I'm
on cdh3b3 and will be moving to b4 once I get a chance to run it through a test cluster.
>>> 
>>> thank you,
>>> -chris
>> 
> 


Mime
View raw message