hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nanheng Wu <nanhen...@gmail.com>
Subject Re: What's the region server doing?
Date Wed, 02 Mar 2011 01:51:51 GMT
Alright, let's do that. Quick dinner first. Thanks JD!

On Tue, Mar 1, 2011 at 5:48 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> next() is the call to get the next row from a scan.
>
> Maybe you aren't looking at the right region server? If you'd like to
> speed up this debugging session, feel free to drop by the #hbase
> channel on freenode, then we could report the results on the mailing
> list.
>
> J-D
>
> On Tue, Mar 1, 2011 at 5:43 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>> And what's "next?" .... and what's next?
>>
>> On Tue, Mar 1, 2011 at 5:41 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>>> I just took the stack track of both master and the meta RS. the
>>> master's still waiting for that thread which called "next", but no IPC
>>> Server handler on the RS has that call. Is that possible? Or have I
>>> just stared at this thing for too long?
>>>
>>> On Tue, Mar 1, 2011 at 5:32 PM, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>> Yes, and on the other side (which is the region server that hosts
>>>> .META.) you should be able to see that call. Well, not that specific
>>>> one, but one of them :)
>>>>
>>>> J-D
>>>>
>>>> On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>>>>> You said "next", I don't know if this related at all but from the
>>>>> master's thread dump, it says the disable is blocked by this thread
>>>>> below, and it calling next:
>>>>>
>>>>> Thread 27 (RegionManager.metaScanner):
>>>>>  State: WAITING
>>>>>  Blocked count: 69503
>>>>>  Waited count: 68805
>>>>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
>>>>>  Stack:
>>>>>    java.lang.Object.wait(Native Method)
>>>>>    java.lang.Object.wait(Object.java:485)
>>>>>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>>>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>>>>    $Proxy1.next(Unknown Source)
>>>>>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
>>>>>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>>>>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>>>>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>>>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>>>>
>>>>> On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <nanhengwu@gmail.com>
wrote:
>>>>>> Thanks man I'll try that and post back when I find something. BTW,
I
>>>>>> ran the script to set the memstore flush size on .META., now I am
>>>>>> seeing a lot less writing to HDFS from the .META RS and less
>>>>>> compaction, unfortunately it's still low. :(
>>>>>>
>>>>>> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>>>>> In that specific jstack it's doing nothing at all, but keep in
mind
>>>>>>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>>>>>>> a few times and at some point you should see the threads named
like
>>>>>>> "IPC Server handler xx on 60020" (where xx is a number) showing
bigger
>>>>>>> stack traces with HRegionServer doing stuff like get, next, put,
etc
>>>>>>>
>>>>>>> You should also try scanning '.META.' from the shell and if it's
slow,
>>>>>>> do the jstack'ing at the same time.
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <nanhengwu@gmail.com>
wrote:
>>>>>>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very
very slow
>>>>>>>> for any operation like disable table or delete. Master's
thread dump
>>>>>>>> says they are blocked by the metaScanner thread. When I looked
at the
>>>>>>>> log file on the .META RS there are no outputs at all! (INFO
debug
>>>>>>>> level). J-D has been helping me on this, we pretty much figured
out
>>>>>>>> that RegionManager.metaScanner is the culprit, because it's
taking
>>>>>>>> around 25 minutes to scan 8K rows. What I don't get is what
the region
>>>>>>>> server is actually doing during this time. There's no request
at all
>>>>>>>> on the cluster, no RS splits either because we just use a
MR job to
>>>>>>>> output HFiles and never write again.
>>>>>>>> J-D has been really really helpful, but I feel like I took
too much of
>>>>>>>> his time. Below is the thread dump of the .META RS during
the time
>>>>>>>> when disables command are blocked on meta scanner, can someone
help me
>>>>>>>> figure out what the server is doing, is it running any thread
at all?
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> http://pastebin.com/CZQAywq3
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message