hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Nigam <sumit_o...@yahoo.com.INVALID>
Subject Re: About exceptions
Date Thu, 19 Nov 2015 04:59:19 GMT
Hello Ted,
I could finally replicate one of the issues below :
1. Wed Nov 18 02:27:36 EST 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@1a8bbdc9,
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException:
org.apache.hadoop.hbase.master.TableNamespaceManager isn't ready to serve at org.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)
at org.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)
at org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473) at org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:43312)


At the same time, HMaster logs show following line:
2015-11-17 22:27:21,607 WARN  [master:ip-172-31-23-41:48470] master.TableNamespaceManager:
Timedout waiting for namespace table to be assigned.
2015-11-17 22:27:21,607 INFO  [master:ip-172-31-23-41:48470] master.HMaster: Master has completed
initialization2015-11-17 22:31:21,616 DEBUG [ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore]
master.HMaster: Not running balancer because 159 region(s) in transition: {d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d
state=PENDING_OPEN, ts=1447827986817, server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069},
83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5 state=PE...2015-11-17 22:36:21,616
DEBUG [ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore] master.HMaster:
Not running balancer because 159 region(s) in transition: {d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d
state=PENDING_OPEN, ts=1447827986817, server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069},
83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5 state=PE...

Not sure, what makes it time out. I looked at that code and it seems it tries to load all
the regions for a given table but times out. Not sure if it points to zookeeper or hdfs problem
or some other.
Would this give any clues?
One more thing of interest is that the Hbase client (which shows up the error) and HMaster
machines in this particular case are not time-synced. I notice a day's gap but I assume that
NTP time-sync is only a requirement for Hbase master/ region servers and not also for their
clients.
Thanks,Sumit 
      From: Ted Yu <yuzhihong@gmail.com>
 To: Sumit Nigam <sumit_only@yahoo.com> 
Cc: "user@hbase.apache.org" <user@hbase.apache.org>
 Sent: Sunday, November 15, 2015 9:14 PM
 Subject: Re: About exceptions
   
bq. if we increase #retries from our end, is there a chance that it may get past the issue?
Most likely the chance of getting past the issue would be low without manually fixing the
condition.
For #2, it is a mystery because 0.98 master does not have Procedure V2 in Apache. What distro
are you using ?
For #3, unclean shutdown could be one of the causes. To make further assessment, log snippet
from master concerning the table is desirable.
Cheers


On Sun, Nov 15, 2015 at 2:25 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:

Thank you Ted.
I was unaware of both those issues. The issue with these exceptions is that they are intermittent
and do not replicate easily. So, let me see if I can replicate it with trace enabled. For
#1, should retrying be attempted? Or possibly, if we increase #retries from our end, is there
a chance that it may get past the issue? I like the idea of master having a WAL (HBASE-14190)
to find/ fix such inconsistencies.
#2 That trace showed up in a hbase client. 
#3 unclean shutdown is possibly one case? I do not explicitly enable/ disable tables. So,
I assume those reasons may be related to Hbase code? And any advise on if I can somehow avoid
it in first place? 
Thanks,Sumit
      From: Ted Yu <yuzhihong@gmail.com>
 To: Sumit Nigam <sumit_only@yahoo.com> 
Cc: "user@hbase.apache.org" <user@hbase.apache.org> 
 Sent: Sunday, November 15, 2015 3:34 PM
 Subject: Re: About exceptions
   
Sumit:For #1, I have seen a similar issue (HBASE-14190, though on hbase 1.x release).If you
have debug logging enabled, please pastebin relevant master log snippet so that we can take
a closer look.
For #2, I am bit confused - I didn't find CreateTableProcedure.java in 0.98 branch. To my
knowledge, CreateTableProcedure is only in hbase 1 release.Did you see the stack trace in
master log ?
For #3, there could be various reasons a table was not enabled.You can trace the table assignment
in master log, check log from hbase:meta server to see if you can find some clue.
bq. Hbase fails only after it exhausts its attempts so retrying may not be helpful?
Your understanding should be correct.
I want to bring your attention to HBASE-12070 which helps you fix ZK inconsistencies.
Cheers


On Sun, Nov 15, 2015 at 12:29 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:

Hi Ted,
Thanks for your reply. I am using Hbase 0.98.14. I have used hbck, but for some (unknown)
reason it has not always resolved inconsistencies. 
I have been able to get around these issues so far by deleting ZK entries for the offending
table and restarting Hbase. But I am not sure what causes them in the first place and if I
can avoid those issues through code or not. Also, upon getting these exceptions is it a good
idea to retry the operation. I think Hbase fails only after it exhausts its attempts so retrying
may not be helpful?

Here are 3 logs snippets:
1. TableNamespaceManager isn't ready to serve:
Fri Nov 13 17:47:19 IST 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@44726f67,org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):java.io.IOException:
org.apache.hadoop.hbase.master.TableNamespaceManager isn'tready to serve        atorg.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)       
atorg.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)       
atorg.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473)       
atorg.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)


2. TableExistsException:
Caused by: org.apache.hadoop.hbase.TableExistsException: org.apache.hadoop.hbase.TableExistsException:
ldmns:exDocStoreat org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.prepareCreate(CreateTableProcedure.java:300)at
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:106)at
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:58)...
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)at
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3403)at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:632)at
org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:523)

3. TableNotEnabledException:
Caused by: org.apache.hadoop.hbase.TableNotEnabledException: ldmns:DataDomain_stage is disabled.
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1139)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:963)
at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:74)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:833) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:810)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:842) at com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:155)
      From: Ted Yu <yuzhihong@gmail.com>
 To: "user@hbase.apache.org" <user@hbase.apache.org>; Sumit Nigam <sumit_only@yahoo.com>

 Sent: Sunday, November 15, 2015 10:50 AM
 Subject: Re: About exceptions
   
bq. TableNotEnabledExceptionTableNotFoundExceptionIOException
Can you show log snippets where these exceptions occurred ?Which release of hbase are you
using ?
Have you run hbck to repair the inconsistencies ?
See http://hbase.apache.org/book.html#hbck.in.depth
Cheers


On Sat, Nov 14, 2015 at 8:42 PM, Sumit Nigam <sumit_only@yahoo.com.invalid> wrote:

Hi,
There are some exceptions which I face intermittently with Hbase and I thought some help from
experts online can really help me. These are:
TableNotEnabledExceptionTableNotFoundExceptionIOException - TableNamespaceManager isn't ready
to serve

One of the reasons I can see for this seems to be zookeeper and Hbase/ Hdfs data being out
of sync due to an unclean shutdown. 
So, my questions are these:
1. Are these exceptions only related to unclean shutdowns?2. Do I need to explicitly handle
them and retry the operation again because they also seem to indicate that it is some race
condition between trying to access a table vs Hbase enabling them?
Any help is greatly appreciated.
Thanks,Sumit



   



   



  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message