hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Kutner <e...@gigya.com>
Subject Re: Region server shutting down due to HDFS error
Date Thu, 05 Apr 2012 13:25:04 GMT
As promised I'm writing back to update the list.
Seems that after upgrading to cdh3u3 of the hadoop cluster and zookeeper
ensemble (hadoop alone wasn't enough) things are no operating well with no
HDFS errors in the logs. I've also set
hbase.regionserver.logroll.errors.tolerated to 3 just in case. Now that the
log is clean a new exception shows up but I'll open a separate thread about
it.

Thanks everyone.

-eran



On Wed, Mar 28, 2012 at 23:06, Eran Kutner <eran@gigya.com> wrote:

> hmmm... I couldn't find it either, so I've looked at the history of that
> file and sure enough a few check-ins back it had that message.
> I have no idea how something like this could happen. I know I had some
> merge issues when I first got the latest version and built that project but
> I've then reverted all local changes and rebuilt. The only thing I can
> imagine is that the previous compiled class file was not modified and it
> was the one that got included in the JAR, although I don;t really know how
> can it happen.
>
> -eran
>
>
>
> On Wed, Mar 28, 2012 at 18:53, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Eran:
>> The error indicated some zookeeper related issue.
>> Do you see KeeperException after the Error log ?
>>
>> I searched 90 codebase but couldn't find the exact log phrase:
>>
>> zhihyu$ find src/main -name '*.java' -exec grep "getting node's version in
>> CLOSI" {} \; -print
>> zhihyu$ find src/main -name '*.java' -exec grep 'Error getting ' {} \;
>> -print
>>
>> Cheers
>>
>> On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner <eran@gigya.com> wrote:
>>
>> > I don't see any prior HDFS issues in the 15 minutes before this
>> exception.
>> > The logs on the datanode reported as problematic are clean as well.
>> > However, I now see the log is full of errors like this:
>> > 2012-03-28 00:15:05,358 DEBUG
>> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
>> Processing
>> > close of gs_users,731481|S
>> > n쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5.
>> > 2012-03-28 00:15:05,359 WARN
>> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Error
>> > getting node's version in CLOSIN
>> > G state, aborting close of
>> >
>> gs_users,731481|Sn쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5.
>> >
>> > -eran
>> >
>> >
>> >
>> > On Wed, Mar 28, 2012 at 18:38, Jean-Daniel Cryans <jdcryans@apache.org
>> > >wrote:
>> >
>> > > Any chance we can see what happened before that too? Usually you
>> > > should see a lot more HDFS spam before getting that all the datanodes
>> > > are bad.
>> > >
>> > > J-D
>> > >
>> > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <eran@gigya.com> wrote:
>> > > > Hi,
>> > > >
>> > > > We have region server sporadically stopping under load due
>> supposedly
>> > to
>> > > > errors writing to HDFS. Things like:
>> > > >
>> > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > > while
>> > > > syncing
>> > > > java.io.IOException: All datanodes 10.1.104.10:50010 are bad.
>> > Aborting..
>> > > >
>> > > > It's happening with a different region server and data node every
>> time,
>> > > so
>> > > > it's not a problem with one specific server and there doesn't seem
>> to
>> > be
>> > > > anything really wrong with either of them. I've already increased
>> the
>> > > file
>> > > > descriptor limit, datanode xceivers and data node handler count. Any
>> > idea
>> > > > what can be causing these errors?
>> > > >
>> > > >
>> > > > A more complete log is here: http://pastebin.com/wC90xU2x
>> > > >
>> > > > Thanks.
>> > > >
>> > > > -eran
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message