hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Tuttle" <ted.tut...@mentacapital.com>
Subject RS unresponsive after series of deletes
Date Wed, 13 Jun 2012 19:09:52 GMT
Hi All-

I have a repeatable and troublesome HBase interaction that I would like some advice on.  

I am running a 5 node cluster on v0.94 on cdh3u3 and accessing through Java client API. Each
RS has 32G of RAM, is running w/ 16G heap w/ 4G for block cache. Used heap of each RS is well
below 16G available. 

My client code has a set of deletes to carry out.  After successfully issuing 19 such deletes
the client begins logging HBase errors while trying to complete the deletes.  It logs ERRORs
every 60s for 10 times and then gives up. 

I estimate that the client successfully deleted about 270MB of data in the first 19 deletes.
 Each batch delete covering about 144 rows with a row size of about 100KB.  

Here is first of 10 ERRORs logged in client: http://pastebin.com/QMJsbgkZ.  Client errors
are 1 per minute between 00:22:48 and 00:32:58 with final error being: http://pastebin.com/ajaVxYUZ

Ultimately, the RS became responsive again. Looking at monitoring I see spike in CPU utilization
on node that is unresponsive; it goes from 2% utilization to 20% and sticks there for a few
minutes.  None of the other nodes in the cluster appear busy at this time. 

Logs from unresponsive RS are here: http://pastebin.com/z9qxGuJS  There are no ERRORs in the
log around the time of the unresponsiveness.

It appears from the server log that the "responseTooSlow" operation completed about 7min after
the client gave up.  

So, any ideas what was making the RS unresponsive? Did it really take 17min to delete 280MB
of data?  

I can easily change client RPC timeouts and number of retries, but I feel there is some I
am missing.  Any suggestions?


View raw message