hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HBase client hangs after upgrade to 0.20.4 when used from reducer
Date Wed, 12 May 2010 19:02:56 GMT
Hi Friso,

Also, if you can capture a jstack of the regionservers at thie time
that would be great.

-Todd

On Wed, May 12, 2010 at 9:26 AM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Friso,
>
> Unfortunately it's hard to determine the cause with the provided
> information, the client call you pasted is pretty much normal i.e. the
> client is waiting to receive a result from a region server.
>
> The fact that you can't shut down the master when this happens is very
> concerning. Do you still have those logs around? Same for the region
> servers? Can you post this in pastebin or on a web server?
>
> Also, feel free to come chat with us on IRC, it's always easier to
> debug when live. #hbase on freenode
>
> J-D
>
> On Wed, May 12, 2010 at 8:31 AM, Friso van Vollenhoven
> <fvanvollenhoven@xebia.com> wrote:
>> Hi all,
>>
>> I am using Hadoop (0.20.2) and HBase to periodically import data (every 15 minutes).
There are a number of import processes, but generally they all create a sequence file on HDFS,
which is then run through a MapReduce job. The MapReduce uses the identity mapper (the input
file is a Hadoop sequence file) and a specialized reducer that does the following:
>> - Combine the values for a key into one value
>> - Do a Get from HBase to retrieve existing values for the same key
>> - Combine the existing value from HBase and the new one into one value again
>> - Put the final value into HBase under the same key (thus 'overwrite' the existing
row; I keep only one version)
>>
>> After I upgraded HBase to the 0.20.4 release, the reducers sometimes start hanging
on a Get. When the jobs start, some reducers run to completion fine, but after a while the
last reducers will start to hang. Eventually the reducers are killed of by Hadoop (after 600
secs).
>>
>> I did a thread dump for one of the hanging reducers. It looks like this:
>> "main" prio=10 tid=0x0000000048083800 nid=0x4c93 in Object.wait() [0x00000000420ca000]
>>   java.lang.Thread.State: WAITING (on object monitor)
>>        at java.lang.Object.wait(Native Method)
>>        - waiting on <0x00002aaaaeb50d70> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>>        at java.lang.Object.wait(Object.java:485)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
>>        - locked <0x00002aaaaeb50d70> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>        at $Proxy2.get(Unknown Source)
>>        at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:450)
>>        at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:448)
>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:447)
>>        at net.ripe.inrdb.hbase.accessor.real.HBaseTableAccessor.get(HBaseTableAccessor.java:36)
>>        at net.ripe.inrdb.hbase.store.HBaseStoreUpdater.getExistingRecords(HBaseStoreUpdater.java:101)
>>        at net.ripe.inrdb.hbase.store.HBaseStoreUpdater.mergeTimelinesWithExistingRecords(HBaseStoreUpdater.java:60)
>>        at net.ripe.inrdb.hbase.store.HBaseStoreUpdater.doInsert(HBaseStoreUpdater.java:40)
>>        at net.ripe.inrdb.core.store.SinglePartitionStore$Updater.insert(SinglePartitionStore.java:92)
>>        at net.ripe.inrdb.core.store.CompositeStore$CompositeStoreUpdater.insert(CompositeStore.java:142)
>>        at net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:70)
>>        at net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:17)
>>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> So the client hangs in a wait() call, waiting on a HBaseClient$Call object. I looked
at the code. The wait is in a while() loop and has no time out, so it figures that it never
gets out of there if no notify() gets called on the object. I am not sure for exactly what
condition it is waiting, however.
>>
>> Meanwhile, after this has happened, I cannot shutdown the master server normally.
I have to kill -9 it, to make it shut down. Normally and before this problem occurs, the master
server shuts down just fine. (Sorry, didn't do a thread dump of the master and now I downgraded
to 0.20.3 again.)
>>
>> I cannot reproduce this error on my local setup (developer machine). It only occurs
on our (currently modest) cluster of one machine running master+NN+Zookeeper and four datanodes
which are all task trackers and region servers as well. The inputs to the periodic MapReduce
jobs are very small (ranging from some Kb to several Mb) and thus contain not so many records.
I know this is not very efficient to do in MapReduce and will be faster when inserted in process
by the importer process because of startup overhead, but we are setting up this architecture
of importers and insertion for anticipated larger loads (up to 80 million records per day).
>>
>> Does anyone have a clue about what happens? Or where to look for further investigation?
>>
>> Thanks a lot!
>>
>>
>> Cheers,
>> Friso
>>
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message