hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nanheng Wu <nanhen...@gmail.com>
Subject Re: Disabling a table taking very long time
Date Mon, 28 Feb 2011 22:41:57 GMT
J-D,

  Thanks so much for your help so far! I sent disable commands on 4
rather small tables and they got stuck for a long time again, so I
took jstack of the master. From what I can tell, all disableTable
calls are blocked by a meta scanner thread (sample log below). At the
moment there were no other requests to the server at all. How should I
investigate this further? If it helps, here's some background: I have
several datasets, each of them is in a separate table; Our data
pipeline produces a new version of each dataset everyday and only the
lastest version should be used. This is how the data is loaded: for
each dataset 1) run a MR job and outputs HFiles 2) call loadTable.rb
to create a new table 3) disable and drop the previous version. As a
result some calls to load table and drop table would overlap. Please
let me know if something stands out to you as a potential culprit.
Thanks!

BTW, I am running Hadoop 0.20.2 with HBase 0.20.6

Thread 47 (IPC Server handler 13 on 60000):
  State: BLOCKED
  Blocked count: 3801
  Waited count: 72719
  Blocked on java.lang.Object@75ac522c
  Blocked by 27 (RegionManager.metaScanner)
  Stack:
    org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:154)
    org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:842)
    sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    java.lang.reflect.Method.invoke(Method.java:597)
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)


Thread 27 (RegionManager.metaScanner):
  State: WAITING
  Blocked count: 1526058
  Waited count: 1488998
  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@4dd44ab0
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
    $Proxy1.get(Unknown Source)
    org.apache.hadoop.hbase.master.BaseScanner.checkAssigned(BaseScanner.java:543)
    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:192)
    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
    org.apache.hadoop.hbase.Chore.run(Chore.java:68)




On Fri, Feb 25, 2011 at 10:23 AM, Jean-Daniel Cryans
<jdcryans@apache.org> wrote:
> An hour to disable? That doesn't sound right at all :)
>
> I would approach this problem like I generally do with HBase issue,
> first check the master log for any weirdness regarding my problem (in
> this case, grep for the table name).
>
> Then I would look the region server log(s) of the nodes that were
> hosting regions from that table. You should see the steps taken to
> disable the regions (starting to close, flush, region completely
> closed).
>
> If you are able to do it while it's taking a very long time to
> disable, try to jstack the process the seems to be hanging.
>
> Finally, like I said in another thread, there's a bug in 0.20.6 that
> almost prevent disabling a table (or re-enabling) if any region
> recently split and the parent wasn't cleaned yet from .META. and that
> is fixed in 0.90.1
>
> J-D
>
> On Thu, Feb 24, 2011 at 11:37 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>> I think you are right, maybe in the long run I need to re-architect my
>> system so that it doesn't need to create new and delete old tables all
>> the time. In the short term I am having a really hard time with the
>> disabling function, I ran a disable command on a very small table
>> (probably dozen of MBs in size) and are no client using the cluster at
>> all, and that took about 1 hour to complete! The weird thing is on the
>> web UI only the region server carrying the META table has non-zero
>> requests, all other RS have 0 requests the entire time. I would think
>> they should get some request to flush the memstore at least. I *am*
>> using the same RS nodes for some map reduce job at the time and top
>> shows the memory usage is almost full on the META region. Would you
>> have some idea of what I should investigate?
>> Thanks so much.
>

Mime
View raw message