hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: About exceptions
Date Thu, 19 Nov 2015 05:20:56 GMT
bq. because 159 region(s) in transition

This case seems to be similar to the one I saw where user table region
assignment blocked system table region assignment.

Can you take a look at the user regions which got stuck in transition ?
One or more of them might continuously fail to open. You should get some
clue by checking region server log(s).

Cheers

On Wed, Nov 18, 2015 at 8:59 PM, Sumit Nigam <sumit_only@yahoo.com> wrote:

> Hello Ted,
>
> I could finally replicate one of the issues below :
>
> 1. *Wed Nov 18* 02:27:36 EST 2015,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@1a8bbdc9,
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
> java.io.IOException: org.apache.hadoop.hbase.master.*TableNamespaceManager
> isn't ready to serve*
> at
> org.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)
> at
> org.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)
> at
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473)
> at
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)
> at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:43312)
>
>
>
> At the same time, HMaster logs show following line:
>
> *2015-11-17* 22:27:21,607* WARN*  [master:ip-172-31-23-41:48470]
> master.TableNamespaceManager: *Timedout* waiting for namespace table to
> be assigned.
> 2015-11-17 22:27:21,607 INFO  [master:ip-172-31-23-41:48470]
> master.HMaster: *Master has completed* *initialization*
> 2015-11-17 22:31:21,616 DEBUG
> [ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore]
> master.HMaster: Not running balancer because 159 region(s) in transition:
> {d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d
> state=PENDING_OPEN, ts=1447827986817,
> server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069},
> 83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5
> state=PE...
> 2015-11-17 22:36:21,616 DEBUG
> [ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore]
> master.HMaster: *Not running balancer because 159 region(s) in transition*:
> {d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d
> state=PENDING_OPEN, ts=1447827986817,
> server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069},
> 83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5
> state=PE...
>
>
> Not sure, what makes it time out. I looked at that code and it seems it
> tries to load all the regions for a given table but times out. Not sure if
> it points to zookeeper or hdfs problem or some other.
>
> Would this give any clues?
>
> One more thing of interest is that the Hbase client (which shows up the
> error) and HMaster machines in this particular case are not time-synced. I
> notice a day's gap but I assume that NTP time-sync is only a requirement
> for Hbase master/ region servers and not also for their clients.
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Ted Yu <yuzhihong@gmail.com>
> *To:* Sumit Nigam <sumit_only@yahoo.com>
> *Cc:* "user@hbase.apache.org" <user@hbase.apache.org>
> *Sent:* Sunday, November 15, 2015 9:14 PM
> *Subject:* Re: About exceptions
>
> bq. if we increase #retries from our end, is there a chance that it may
> get past the issue?
>
> Most likely the chance of getting past the issue would be low without
> manually fixing the condition.
>
> For #2, it is a mystery because 0.98 master does not have Procedure V2 in
> Apache. What distro are you using ?
>
> For #3, unclean shutdown could be one of the causes. To make further
> assessment, log snippet from master concerning the table is desirable.
>
> Cheers
>
>
>
> On Sun, Nov 15, 2015 at 2:25 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:
>
> Thank you Ted.
>
> I was unaware of both those issues. The issue with these exceptions is
> that they are intermittent and do not replicate easily. So, let me see if I
> can replicate it with trace enabled. For #1, should retrying be attempted?
> Or possibly, if we increase #retries from our end, is there a chance that
> it may get past the issue? I like the idea of master having a WAL (
> HBASE-14190) to find/ fix such inconsistencies.
>
> #2 That trace showed up in a hbase client.
>
> #3 unclean shutdown is possibly one case? I do not explicitly enable/
> disable tables. So, I assume those reasons may be related to Hbase code?
> And any advise on if I can somehow avoid it in first place?
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Ted Yu <yuzhihong@gmail.com>
> *To:* Sumit Nigam <sumit_only@yahoo.com>
> *Cc:* "user@hbase.apache.org" <user@hbase.apache.org>
> *Sent:* Sunday, November 15, 2015 3:34 PM
> *Subject:* Re: About exceptions
>
> Sumit:
> For #1, I have seen a similar issue (HBASE-14190, though on hbase 1.x
> release).
> If you have debug logging enabled, please pastebin relevant master log
> snippet so that we can take a closer look.
>
> For #2, I am bit confused - I didn't find CreateTableProcedure.java in
> 0.98 branch. To my knowledge, CreateTableProcedure is only in hbase 1
> release.
> Did you see the stack trace in master log ?
>
> For #3, there could be various reasons a table was not enabled.
> You can trace the table assignment in master log, check log from
> hbase:meta server to see if you can find some clue.
>
> bq. Hbase fails only after it exhausts its attempts so retrying may not
> be helpful?
>
> Your understanding should be correct.
>
> I want to bring your attention to HBASE-12070 which helps you fix ZK
> inconsistencies.
>
> Cheers
>
>
>
> On Sun, Nov 15, 2015 at 12:29 AM, Sumit Nigam <sumit_only@yahoo.com>
> wrote:
>
> Hi Ted,
>
> Thanks for your reply. I am using Hbase 0.98.14. I have used hbck, but for
> some (unknown) reason it has not always resolved inconsistencies.
>
> I have been able to get around these issues so far by deleting ZK entries
> for the offending table and restarting Hbase. But I am not sure what causes
> them in the first place and if I can avoid those issues through code or
> not. Also, upon getting these exceptions is it a good idea to retry the
> operation. I think Hbase fails only after it exhausts its attempts so
> retrying may not be helpful?
>
>
> Here are 3 logs snippets:
>
> 1. TableNamespaceManager isn't ready to serve:
>
> Fri Nov 13 17:47:19 IST 2015,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@44726f67,
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
> java.io.*IOException*: org.apache.hadoop.hbase.master.*TableNamespaceManager
> isn't ready to serve*
>         at
> org.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)
>         at
> org.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)
>         at
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473)
>         at
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)
>
>
>
> 2. TableExistsException:
>
> Caused by: org.apache.hadoop.hbase.TableExistsException:
> org.apache.hadoop.hbase.*TableExistsException: ldmns:exDocStore*
> at
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.prepareCreate(CreateTableProcedure.java:300)
> at
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:106)
> at
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:58)
> ...
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3403)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:632)
> at org.apache.hadoop.hbase.client.HBaseAdmin.*createTable*
> (HBaseAdmin.java:523)
>
>
> 3. TableNotEnabledException:
>
> Caused by: org.apache.hadoop.hbase.*TableNotEnabledException*:
> ldmns:DataDomain_stage is disabled.
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1139)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:963)
> at
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:74)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:833)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:810)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:842)
> at
> com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:155)
>
> ------------------------------
> *From:* Ted Yu <yuzhihong@gmail.com>
> *To:* "user@hbase.apache.org" <user@hbase.apache.org>; Sumit Nigam <
> sumit_only@yahoo.com>
> *Sent:* Sunday, November 15, 2015 10:50 AM
> *Subject:* Re: About exceptions
>
> bq. TableNotEnabledExceptionTableNotFoundExceptionIOException
>
> Can you show log snippets where these exceptions occurred ?
> Which release of hbase are you using ?
>
> Have you run hbck to repair the inconsistencies ?
>
> See http://hbase.apache.org/book.html#hbck.in.depth
>
> Cheers
>
>
>
> On Sat, Nov 14, 2015 at 8:42 PM, Sumit Nigam <sumit_only@yahoo.com.invalid
> > wrote:
>
> Hi,
> There are some exceptions which I face intermittently with Hbase and I
> thought some help from experts online can really help me. These are:
> TableNotEnabledExceptionTableNotFoundExceptionIOException -
> TableNamespaceManager isn't ready to serve
>
> One of the reasons I can see for this seems to be zookeeper and Hbase/
> Hdfs data being out of sync due to an unclean shutdown.
> So, my questions are these:
> 1. Are these exceptions only related to unclean shutdowns?2. Do I need to
> explicitly handle them and retry the operation again because they also seem
> to indicate that it is some race condition between trying to access a table
> vs Hbase enabling them?
> Any help is greatly appreciated.
> Thanks,Sumit
>
>
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message