Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of whitesky@gmail.com designates
 209.85.210.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAGz=dyD7qbsp_D2R_8KqM_u+KR4GHrAXUztAfq+BmD13iEh5xw@mail.gmail.com>
References: <ct8hheih0tck89lq912b3ubx.1325003183910@email.android.com>
	<E41F4E6CFB766541888FC855399F07C5A3D3F0@szxeml525-mbs.china.huawei.com>
	<CAGz=dyD7qbsp_D2R_8KqM_u+KR4GHrAXUztAfq+BmD13iEh5xw@mail.gmail.com>
Date: Thu, 29 Dec 2011 12:26:08 +0800
Message-ID: 
 <CAGz=dyCQfuF9STpHOnTWtOEh6c3JNOqG9jictGUT-eZqeppmvQ@mail.gmail.com>
Subject: Re: Read speed down after long running
From: Yi Liang <whitesky@gmail.com>
To: user@hbase.apache.org
Cc: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=14dae9340877ee23c804b53383f5

--14dae9340877ee23c804b53383f5
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: quoted-printable

Sorry, I forgot there's another kind of client process, the Java MapReduce
jobs to write data. I don't restart them either. They're usually
short-lived.

I think either the M/R jobs or thrift servers would execute the
HBaseAdmin.tableExists, because we use them only to do get or put
operations. The M/R jobs are used to put and get data, the thrift servers
are used to get rows of data. All tables were created once, and never
altered/deleted any more.

2011/12/29 Yi Liang <whitesky@gmail.com>

> Lars, Ram:
>
> I don't restart client processes(in my case, they're thrift servers), I
> only restart the master and rs. Do you mean I should also restart the
> thrift servers?
>
> I'm now checking the code of thrift server, it seems that it does use HBa=
seAdmin.tableExists
> somewhere like createTable() and deleteTable().
>
> Jinchao:
> I don't see any clue when checking rs with jstack, which states/threads
> should I check more carefully?. When the problem occurs, we see bigger IO
> than usual, the memory and network look ok.
>
> Thank you for your suggestions!
> Yi
>
> On Wed, Dec 28, 2011 at 4:21 PM, Gaojinchao <gaojinchao@huawei.com> wrote=
:
>
>> I think you need check the threaddump(Client and RS) and
>> resources(memory, IO and network) of your cluster.
>>
>> -----=D3=CA=BC=FE=D4=AD=BC=FE-----
>> =B7=A2=BC=FE=C8=CB: Lars H [mailto:lhofhansl@yahoo.com]
>> =B7=A2=CB=CD=CA=B1=BC=E4: 2011=C4=EA12=D4=C228=C8=D5 0:32
>> =CA=D5=BC=FE=C8=CB: user@hbase.apache.org
>> =B3=AD=CB=CD: hbase-user@hadoop.apache.org
>> =D6=F7=CC=E2: Re: Read speed down after long running
>>
>> When you restart HBase are you also restarting the client process?
>> Are you using HBaseAdmin.tableExists?
>> If so you might be running into HBASE-5073
>>
>> -- Lars
>>
>> Yi Liang <whitesky@gmail.com> schrieb:
>>
>> >Hi all,
>> >
>> >We're running hbase 0.90.3 for one read intensive application.
>> >
>> >We find after long running(2 weeks or 1 month or longer), the read spee=
d
>> >will become much lower.
>> >
>> >For example, a get_rows operation of thrift to fetch 20 rows (about 4k
>> size
>> >every row) could take >2 second, sometimes even >5 seconds. When it
>> >happens, we can see cpu_wio keeps at about 10.
>> >
>> >But if we restart hbase(only master and regionservers) with stop-hbase.=
sh
>> >and start-hbase.sh, we can see the read speed back to normal immediatel=
y,
>> >which is <200 ms for every get_rows operation, and the cpu_wio drops to
>> >about 2.
>> >
>> >When the problem appears, there's no exception in logs, and no
>> >flush/compaction, nothing abnormal except a few warning logs sometimes
>> like
>> >below:
>> >2011-12-27 15:50:20,307 WARN
>> org.apache.hadoop.hbase.regionserver.wal.HLog:
>> >IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
>> >editcount=3D1, len~=3D9.8k
>> >
>> >Our cluster has 10 region servers, each with 25g heap size, 64% of whic=
h
>> >used for cache. The're some m/r jobs keep running in another cluster to
>> >feed data into the this hbase. Every night, we do flush and major
>> >compaction. Usually there's no flush or compaction in the daytime.
>> >
>> >Could anybody explain why the read speed could become lower after long
>> >running, and why it back to normal immediately after restarting hbase?
>> >
>> >Every advice will be highly appreciated.
>> >
>> >Thanks,
>> >Yi
>>
>
>

--14dae9340877ee23c804b53383f5--