Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 10B8679B0 for ; Thu, 29 Dec 2011 04:26:41 +0000 (UTC) Received: (qmail 98224 invoked by uid 500); 29 Dec 2011 04:26:37 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 97780 invoked by uid 500); 29 Dec 2011 04:26:37 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 97742 invoked by uid 99); 29 Dec 2011 04:26:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Dec 2011 04:26:36 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of whitesky@gmail.com designates 209.85.210.169 as permitted sender) Received: from [209.85.210.169] (HELO mail-iy0-f169.google.com) (209.85.210.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Dec 2011 04:26:29 +0000 Received: by iacb35 with SMTP id b35so27605114iac.14 for ; Wed, 28 Dec 2011 20:26:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zSkUAw17y2WT9ozWwflWupHrlhfjTBTUBoHYz6fbewc=; b=qVobrH8SBUhTmD0rRc7eMPNVPNlJJQPI2MMZLiMXq4CymI/aX9q8ZnkO9CK8v2ef5B cu1XhMfw9owod2yI6nO67FLuTdbHvXp8qfBbH4MpXRlVTU+GrU+HYmSiYrEbbFknqJ+G L/d8ZtF2CqwZQ2DgL7igWyoR9Msx/aqFEF5tQ= MIME-Version: 1.0 Received: by 10.50.34.233 with SMTP id c9mr26220180igj.19.1325132768223; Wed, 28 Dec 2011 20:26:08 -0800 (PST) Received: by 10.42.175.6 with HTTP; Wed, 28 Dec 2011 20:26:08 -0800 (PST) In-Reply-To: References: Date: Thu, 29 Dec 2011 12:26:08 +0800 Message-ID: Subject: Re: Read speed down after long running From: Yi Liang To: user@hbase.apache.org Cc: "hbase-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=14dae9340877ee23c804b53383f5 X-Virus-Checked: Checked by ClamAV on apache.org --14dae9340877ee23c804b53383f5 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Sorry, I forgot there's another kind of client process, the Java MapReduce jobs to write data. I don't restart them either. They're usually short-lived. I think either the M/R jobs or thrift servers would execute the HBaseAdmin.tableExists, because we use them only to do get or put operations. The M/R jobs are used to put and get data, the thrift servers are used to get rows of data. All tables were created once, and never altered/deleted any more. 2011/12/29 Yi Liang > Lars, Ram: > > I don't restart client processes(in my case, they're thrift servers), I > only restart the master and rs. Do you mean I should also restart the > thrift servers? > > I'm now checking the code of thrift server, it seems that it does use HBa= seAdmin.tableExists > somewhere like createTable() and deleteTable(). > > Jinchao: > I don't see any clue when checking rs with jstack, which states/threads > should I check more carefully?. When the problem occurs, we see bigger IO > than usual, the memory and network look ok. > > Thank you for your suggestions! > Yi > > On Wed, Dec 28, 2011 at 4:21 PM, Gaojinchao wrote= : > >> I think you need check the threaddump(Client and RS) and >> resources(memory, IO and network) of your cluster. >> >> -----=D3=CA=BC=FE=D4=AD=BC=FE----- >> =B7=A2=BC=FE=C8=CB: Lars H [mailto:lhofhansl@yahoo.com] >> =B7=A2=CB=CD=CA=B1=BC=E4: 2011=C4=EA12=D4=C228=C8=D5 0:32 >> =CA=D5=BC=FE=C8=CB: user@hbase.apache.org >> =B3=AD=CB=CD: hbase-user@hadoop.apache.org >> =D6=F7=CC=E2: Re: Read speed down after long running >> >> When you restart HBase are you also restarting the client process? >> Are you using HBaseAdmin.tableExists? >> If so you might be running into HBASE-5073 >> >> -- Lars >> >> Yi Liang schrieb: >> >> >Hi all, >> > >> >We're running hbase 0.90.3 for one read intensive application. >> > >> >We find after long running(2 weeks or 1 month or longer), the read spee= d >> >will become much lower. >> > >> >For example, a get_rows operation of thrift to fetch 20 rows (about 4k >> size >> >every row) could take >2 second, sometimes even >5 seconds. When it >> >happens, we can see cpu_wio keeps at about 10. >> > >> >But if we restart hbase(only master and regionservers) with stop-hbase.= sh >> >and start-hbase.sh, we can see the read speed back to normal immediatel= y, >> >which is <200 ms for every get_rows operation, and the cpu_wio drops to >> >about 2. >> > >> >When the problem appears, there's no exception in logs, and no >> >flush/compaction, nothing abnormal except a few warning logs sometimes >> like >> >below: >> >2011-12-27 15:50:20,307 WARN >> org.apache.hadoop.hbase.regionserver.wal.HLog: >> >IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog; >> >editcount=3D1, len~=3D9.8k >> > >> >Our cluster has 10 region servers, each with 25g heap size, 64% of whic= h >> >used for cache. The're some m/r jobs keep running in another cluster to >> >feed data into the this hbase. Every night, we do flush and major >> >compaction. Usually there's no flush or compaction in the daytime. >> > >> >Could anybody explain why the read speed could become lower after long >> >running, and why it back to normal immediately after restarting hbase? >> > >> >Every advice will be highly appreciated. >> > >> >Thanks, >> >Yi >> > > --14dae9340877ee23c804b53383f5--