hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Liang <white...@gmail.com>
Subject Re: Read speed down after long running
Date Thu, 29 Dec 2011 01:54:36 GMT
Lars, Ram:

I don't restart client processes(in my case, they're thrift servers), I
only restart the master and rs. Do you mean I should also restart the
thrift servers?

I'm now checking the code of thrift server, it seems that it does use
somewhere like createTable() and deleteTable().

I don't see any clue when checking rs with jstack, which states/threads
should I check more carefully?. When the problem occurs, we see bigger IO
than usual, the memory and network look ok.

Thank you for your suggestions!

On Wed, Dec 28, 2011 at 4:21 PM, Gaojinchao <gaojinchao@huawei.com> wrote:

> I think you need check the threaddump(Client and RS) and resources(memory,
> IO and network) of your cluster.
> -----邮件原件-----
> 发件人: Lars H [mailto:lhofhansl@yahoo.com]
> 发送时间: 2011年12月28日 0:32
> 收件人: user@hbase.apache.org
> 抄送: hbase-user@hadoop.apache.org
> 主题: Re: Read speed down after long running
> When you restart HBase are you also restarting the client process?
> Are you using HBaseAdmin.tableExists?
> If so you might be running into HBASE-5073
> -- Lars
> Yi Liang <whitesky@gmail.com> schrieb:
> >Hi all,
> >
> >We're running hbase 0.90.3 for one read intensive application.
> >
> >We find after long running(2 weeks or 1 month or longer), the read speed
> >will become much lower.
> >
> >For example, a get_rows operation of thrift to fetch 20 rows (about 4k
> size
> >every row) could take >2 second, sometimes even >5 seconds. When it
> >happens, we can see cpu_wio keeps at about 10.
> >
> >But if we restart hbase(only master and regionservers) with stop-hbase.sh
> >and start-hbase.sh, we can see the read speed back to normal immediately,
> >which is <200 ms for every get_rows operation, and the cpu_wio drops to
> >about 2.
> >
> >When the problem appears, there's no exception in logs, and no
> >flush/compaction, nothing abnormal except a few warning logs sometimes
> like
> >below:
> >2011-12-27 15:50:20,307 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> >IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
> >editcount=1, len~=9.8k
> >
> >Our cluster has 10 region servers, each with 25g heap size, 64% of which
> >used for cache. The're some m/r jobs keep running in another cluster to
> >feed data into the this hbase. Every night, we do flush and major
> >compaction. Usually there's no flush or compaction in the daytime.
> >
> >Could anybody explain why the read speed could become lower after long
> >running, and why it back to normal immediately after restarting hbase?
> >
> >Every advice will be highly appreciated.
> >
> >Thanks,
> >Yi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message