hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Disabling a table taking very long time
Date Mon, 28 Feb 2011 22:49:19 GMT
Are you able to tell if that call in metaScanner is hanging or it's
making multiple calls to the .META. region?

If former, then jstack the region server that hosts .META. and see
where it's blocked.

if latter, then it means your .META. region is slow? Again, what's
going on on the RS that hosts .META.?

Finally, what's the master's log like during that time?

J-D

On Mon, Feb 28, 2011 at 2:41 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
> J-D,
>
>  Thanks so much for your help so far! I sent disable commands on 4
> rather small tables and they got stuck for a long time again, so I
> took jstack of the master. From what I can tell, all disableTable
> calls are blocked by a meta scanner thread (sample log below). At the
> moment there were no other requests to the server at all. How should I
> investigate this further? If it helps, here's some background: I have
> several datasets, each of them is in a separate table; Our data
> pipeline produces a new version of each dataset everyday and only the
> lastest version should be used. This is how the data is loaded: for
> each dataset 1) run a MR job and outputs HFiles 2) call loadTable.rb
> to create a new table 3) disable and drop the previous version. As a
> result some calls to load table and drop table would overlap. Please
> let me know if something stands out to you as a potential culprit.
> Thanks!
>
> BTW, I am running Hadoop 0.20.2 with HBase 0.20.6
>
> Thread 47 (IPC Server handler 13 on 60000):
>  State: BLOCKED
>  Blocked count: 3801
>  Waited count: 72719
>  Blocked on java.lang.Object@75ac522c
>  Blocked by 27 (RegionManager.metaScanner)
>  Stack:
>    org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:154)
>    org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:842)
>    sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    java.lang.reflect.Method.invoke(Method.java:597)
>    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>
>
> Thread 27 (RegionManager.metaScanner):
>  State: WAITING
>  Blocked count: 1526058
>  Waited count: 1488998
>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@4dd44ab0
>  Stack:
>    java.lang.Object.wait(Native Method)
>    java.lang.Object.wait(Object.java:485)
>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>    $Proxy1.get(Unknown Source)
>    org.apache.hadoop.hbase.master.BaseScanner.checkAssigned(BaseScanner.java:543)
>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:192)
>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>
>
>
>
> On Fri, Feb 25, 2011 at 10:23 AM, Jean-Daniel Cryans
> <jdcryans@apache.org> wrote:
>> An hour to disable? That doesn't sound right at all :)
>>
>> I would approach this problem like I generally do with HBase issue,
>> first check the master log for any weirdness regarding my problem (in
>> this case, grep for the table name).
>>
>> Then I would look the region server log(s) of the nodes that were
>> hosting regions from that table. You should see the steps taken to
>> disable the regions (starting to close, flush, region completely
>> closed).
>>
>> If you are able to do it while it's taking a very long time to
>> disable, try to jstack the process the seems to be hanging.
>>
>> Finally, like I said in another thread, there's a bug in 0.20.6 that
>> almost prevent disabling a table (or re-enabling) if any region
>> recently split and the parent wasn't cleaned yet from .META. and that
>> is fixed in 0.90.1
>>
>> J-D
>>
>> On Thu, Feb 24, 2011 at 11:37 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>>> I think you are right, maybe in the long run I need to re-architect my
>>> system so that it doesn't need to create new and delete old tables all
>>> the time. In the short term I am having a really hard time with the
>>> disabling function, I ran a disable command on a very small table
>>> (probably dozen of MBs in size) and are no client using the cluster at
>>> all, and that took about 1 hour to complete! The weird thing is on the
>>> web UI only the region server carrying the META table has non-zero
>>> requests, all other RS have 0 requests the entire time. I would think
>>> they should get some request to flush the memstore at least. I *am*
>>> using the same RS nodes for some map reduce job at the time and top
>>> shows the memory usage is almost full on the META region. Would you
>>> have some idea of what I should investigate?
>>> Thanks so much.
>>
>

Mime
View raw message