hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Region server request throughput drops to zero
Date Mon, 04 Oct 2010 03:55:20 GMT
During the event try jstack'ing the affected regionservers. That is usually
extremely illuminating.
On Oct 3, 2010 8:06 PM, "James Baldassari" <jbaldassari@gmail.com> wrote:
> Hi,
>
> We've been having a strange problem with our HBase cluster recently
(0.20.5
> + HBASE-2599 + IHBase-0.20.5). Everything will be working fine, doing
> mostly gets at 5-10k/sec and an hourly bulk insert (using HTable puts)
that
> can spike the total throughput up to 15-50k ops/sec, but at some point the
> cluster gets into this state where the request throughput (gets and puts)
> drops to zero across 5 of our 6 region servers. Restarting the whole
> cluster is the only way to fix the problem, but it gets back into that bad
> state again after 4-12 hours.
>
> Nothing in the region server or master logs indicates any errors except
> occasional DFS client timeouts. The logs look exactly like they do during
> normal operation, even with debug logging on. I have GC logging on as
well,
> and there are no long GC pauses (the region servers have 11G of heap).
When
> the request rate drops the load is low on the region servers, there is
> little to no I/O wait, and there are no messages in the region server logs
> indicating that the region servers are busy doing anything like a
> compaction. It seems like the region servers just decided to stop
> processing requests. We have three different client applications sending
> requests to HBase, and they all drop to zero requests/second at the same
> time, so I don't think it's an issue on the client side. There are no
> errors in our client logs either.
>
> Our hbase-site.xml is here: http://pastebin.com/cJ4cnH5W
>
> Any ideas what could be causing the cluster to freeze up? I guess my next
> plan is to get thread dumps on the region servers and the clients the next
> time it happens. Is there somewhere else I should look other than the
> master and region server logs?
>
> Thanks,
> James

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message