Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB2B010703 for ; Mon, 1 Jul 2013 10:59:34 +0000 (UTC) Received: (qmail 20073 invoked by uid 500); 1 Jul 2013 10:59:32 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 20031 invoked by uid 500); 1 Jul 2013 10:59:32 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 20023 invoked by uid 99); 1 Jul 2013 10:59:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jul 2013 10:59:32 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=FRT_ADOBE2,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [216.109.114.204] (HELO nm42-vm5.bullet.mail.bf1.yahoo.com) (216.109.114.204) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jul 2013 10:59:23 +0000 Received: from [98.139.212.152] by nm42.bullet.mail.bf1.yahoo.com with NNFMP; 01 Jul 2013 10:59:01 -0000 Received: from [98.139.212.202] by tm9.bullet.mail.bf1.yahoo.com with NNFMP; 01 Jul 2013 10:59:01 -0000 Received: from [127.0.0.1] by omp1011.mail.bf1.yahoo.com with NNFMP; 01 Jul 2013 10:59:01 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 889563.81416.bm@omp1011.mail.bf1.yahoo.com Received: (qmail 34701 invoked by uid 60001); 1 Jul 2013 10:59:01 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1372676341; bh=onFVzOyC+84F1H4n27POCXBwNhhF+ZGNzWcpjSrNqoc=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=UMnhnDV0QDj/Ur7kMeC4WlmC/YOKzZ1P4Sx2lRcC8qKOj1fRmXjYc24oueyDFUy+tDo2wtIwLxQtBmoU4v2DiYoFfp/cMHn6gXlZAg6PF4xVvbeVDaW44SgnI4ULrZZliIWiiE402w2qeSs2ad+l6pYo06osmnBReG0J7la8M7Q= X-YMail-OSG: amhB9KwVM1lwIaMY19j._IJja.d4P9QlwQv7AIu3XWaswkP tO6QB6.Dl8zZ7NwmyRcSHCvEClthN2zlNErPaSSAifC67rvi0Cg2BG1eflQe gK1cni0SEI1qxx9Al9n6ss5bVitMecoYc7uRA7ncYIN9yIjtJ1tl6tCDuG9X w7dB2OjbgeDUMjloGbVcRal3xkStJ0mihFnN9pqq6YRIZtw6r9VkBilCimG0 dTCLoRHK7VzCTrA5kIiOby8qeBvtIj6mj9h0c2l5.aTez8R.XoUGPsmZqRHH jf7RFGQIZseWwFiNxhHH8B0Z2_Yk1FIc8jkXLg1VQOKm_pVnFVZNAJNTC_Rn hzixq8cj.C9kDuJNhSC0wmfdmA0EYuEmZAfbhi4kuEjvnffrqRld5b2b29CH doqdMH4CXfeJW4ifg1GxgF1AEUnuiMJ1jESP0KRHaon6cau_WzTW8Kz5_SqE Xx0_jQjFTd.YLiEyrimgZUpS0VbKlyujwHCVYwQ0YJGKBM5f4TdimqmwRDd8 HHgO0wGZi0c6e2sbsnwhsK_ZJBVNresrGNyzTJYOL.jyWBlHClJRUvVyCCVT wBfX5St14W4bK2ag_D3ywl8T_9H3kOp8AHZV_XDzXyOhy Received: from [93.213.42.112] by web140602.mail.bf1.yahoo.com via HTTP; Mon, 01 Jul 2013 03:59:01 PDT X-Rocket-MIMEInfo: 002.001,QWJzb2x1dGVseS4KCgoKLS0tLS0gT3JpZ2luYWwgTWVzc2FnZSAtLS0tLQpGcm9tOiBUZWQgWXUgPHl1emhpaG9uZ0BnbWFpbC5jb20.ClRvOiB1c2VyQGhiYXNlLmFwYWNoZS5vcmcKQ2M6IApTZW50OiBTdW5kYXksIEp1bmUgMzAsIDIwMTMgOTozMiBQTQpTdWJqZWN0OiBSZTogUG9vciBIQmFzZSBtYXAtcmVkdWNlIHNjYW4gcGVyZm9ybWFuY2UKCkxvb2tpbmcgYXQgdGhlIHRhaWwgb2YgSEJBU0UtODM2OSwgdGhlcmUgd2VyZSBzb21lIGNvbW1lbnRzIHdoaWNoIGFyZSB5ZXQKdG8gYmUgYWRkcmVzc2VkLgoBMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.148.557 References: <1370480627.39897.YahooMailNeo@web140604.mail.bf1.yahoo.com> <1372442213.11396.YahooMailNeo@web140603.mail.bf1.yahoo.com> <30DC2A0F-FB22-44BB-B97F-EDD417F813B9@gmail.com> Message-ID: <1372676341.34654.YahooMailNeo@web140602.mail.bf1.yahoo.com> Date: Mon, 1 Jul 2013 03:59:01 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Poor HBase map-reduce scan performance To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Absolutely.=0A=0A=0A=0A----- Original Message -----=0AFrom: Ted Yu =0ATo: user@hbase.apache.org=0ACc: =0ASent: Sunday, June 30, 2= 013 9:32 PM=0ASubject: Re: Poor HBase map-reduce scan performance=0A=0ALook= ing at the tail of HBASE-8369, there were some comments which are yet=0Ato = be addressed.=0A=0AI think trunk patch should be finalized before backporti= ng.=0A=0ACheers=0A=0AOn Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller wrote:=0A=0A> I'll attach my patch to HBASE-8369 tomorrow.=0A>= =0A> On Jun 28, 2013, at 10:56 AM, lars hofhansl wrote:= =0A>=0A> > If we can make a clean patch with minimal impact to existing cod= e I=0A> would be supportive of a backport to 0.94.=0A> >=0A> > -- Lars=0A> = >=0A> >=0A> >=0A> > ----- Original Message -----=0A> > From: Bryan Keller <= bryanck@gmail.com>=0A> > To: user@hbase.apache.org; lars hofhansl =0A> > Cc:=0A> > Sent: Tuesday, June 25, 2013 1:56 AM=0A> > Subjec= t: Re: Poor HBase map-reduce scan performance=0A> >=0A> > I tweaked Enis's = snapshot input format and backported it to 0.94.6 and=0A> have snapshot sca= nning functional on my system. Performance is dramatically=0A> better, as e= xpected i suppose. I'm seeing about 3.6x faster performance vs=0A> TableInp= utFormat. Also, HBase doesn't get bogged down during a scan as the=0A> regi= onserver is being bypassed. I'm very excited by this. There are some=0A> is= sues with file permissions and library dependencies but nothing that=0A> ca= n't be worked out.=0A> >=0A> > On Jun 5, 2013, at 6:03 PM, lars hofhansl wrote:=0A> >=0A> >> That's exactly the kind of pre-fetchin= g I was investigating a bit ago=0A> (made a patch, but ran out of time).=0A= > >> This pre-fetching is strictly client only, where the client keeps the= =0A> server busy while it is processing the previous batch, but filling up = a 2nd=0A> buffer.=0A> >>=0A> >>=0A> >> -- Lars=0A> >>=0A> >>=0A> >>=0A> >> = ________________________________=0A> >> From: Sandy Pratt =0A> >> To: "user@hbase.apache.org" =0A> >> Sent: = Wednesday, June 5, 2013 10:58 AM=0A> >> Subject: Re: Poor HBase map-reduce = scan performance=0A> >>=0A> >>=0A> >> Yong,=0A> >>=0A> >> As a thought expe= riment, imagine how it impacts the throughput of TCP to=0A> >> keep the win= dow size at 1.=A0 That means there's only one packet in flight=0A> >> at a = time, and total throughput is a fraction of what it could be.=0A> >>=0A> >>= That's effectively what happens with RPC.=A0 The server sends a batch,=0A>= then=0A> >> does nothing while it waits for the client to ask for more.=A0= During that=0A> >> time, the pipe between them is empty.=A0 Increasing the= batch size can=0A> help=0A> >> a bit, in essence creating a really huge pa= cket, but the problem=0A> remains.=0A> >> There will always be stalls in th= e pipe.=0A> >>=0A> >> What you want is for the window size to be large enou= gh that the pipe is=0A> >> saturated.=A0 A streaming API accomplishes that = by stuffing data down the=0A> >> network pipe as quickly as possible.=0A> >= >=0A> >> Sandy=0A> >>=0A> >> On 6/5/13 7:55 AM, "yonghu" wrote:=0A> >>=0A> >>> Can anyone explain why client + rpc + server wi= ll decrease the=0A> performance=0A> >>> of scanning? I mean the Regionserve= r and Tasktracker are the same node=0A> >>> when=0A> >>> you use MapReduce = to scan the HBase table. So, in my understanding,=0A> there=0A> >>> will be= no rpc cost.=0A> >>>=0A> >>> Thanks!=0A> >>>=0A> >>> Yong=0A> >>>=0A> >>>= =0A> >>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt = =0A> wrote:=0A> >>>=0A> >>>> https://issues.apache.org/jira/browse/HBASE-86= 91=0A> >>>>=0A> >>>>=0A> >>>> On 6/4/13 6:11 PM, "Sandy Pratt" wrote:=0A> >>>>=0A> >>>>> Haven't had a chance to write a JIRA yet,= but I thought I'd pop in=0A> here=0A> >>>>> with an update in the meantime= .=0A> >>>>>=0A> >>>>> I tried a number of different approaches to eliminate= latency and=0A> >>>>> "bubbles" in the scan pipeline, and eventually arriv= ed at adding a=0A> >>>>> streaming scan API to the region server, along wit= h refactoring the=0A> >>>> scan=0A> >>>>> interface into an event-drive mes= sage receiver interface.=A0 In so=0A> >>>> doing, I=0A> >>>>> was able to t= ake scan speed on my cluster from 59,537 records/sec=0A> with=0A> >>>> the= =0A> >>>>> classic scanner to 222,703 records per second with my new scan A= PI.=0A> >>>>> Needless to say, I'm pleased ;)=0A> >>>>>=0A> >>>>> More deta= ils forthcoming when I get a chance.=0A> >>>>>=0A> >>>>> Thanks,=0A> >>>>> = Sandy=0A> >>>>>=0A> >>>>> On 5/23/13 3:47 PM, "Ted Yu" wrote:=0A> >>>>>=0A> >>>>>> Thanks for the update, Sandy.=0A> >>>>>>=0A> = >>>>>> If you can open a JIRA and attach your producer / consumer scanner= =0A> >>>> there,=0A> >>>>>> that would be great.=0A> >>>>>>=0A> >>>>>> On T= hu, May 23, 2013 at 3:42 PM, Sandy Pratt =0A> >>>> wrote= :=0A> >>>>>>=0A> >>>>>>> I wrote myself a Scanner wrapper that uses a produ= cer/consumer=0A> >>>> queue to=0A> >>>>>>> keep the client fed with a full = buffer as much as possible.=A0 When=0A> >>>>>>> scanning=0A> >>>>>>> my tab= le with scanner caching at 100 records, I see about a 24%=0A> >>>> uplift= =0A> >>>>>>> in=0A> >>>>>>> performance (~35k records/sec with the ClientSc= anner and ~44k=0A> >>>>>>> records/sec=0A> >>>>>>> with my P/C scanner).=A0= However, when I set scanner caching to 5000,=0A> >>>>>>> it's=0A> >>>>>>> = more of a wash compared to the standard ClientScanner: ~53k=0A> >>>> record= s/sec=0A> >>>>>>> with the ClientScanner and ~60k records/sec with the P/C = scanner.=0A> >>>>>>>=0A> >>>>>>> I'm not sure what to make of those results= .=A0 I think next I'll shut=0A> >>>>>>> down=0A> >>>>>>> HBase and read the= HFiles directly, to see if there's a drop off in=0A> >>>>>>> performance b= etween reading them directly vs. via the RegionServer.=0A> >>>>>>>=0A> >>>>= >>> I still think that to really solve this there needs to be sliding=0A> >= >>>>>> window=0A> >>>>>>> of records in flight between disk and RS, and bet= ween RS and=0A> client.=0A> >>>>>>> I'm=0A> >>>>>>> thinking there's probab= ly a single batch of records in flight=0A> >>>> between=0A> >>>>>>> RS=0A> = >>>>>>> and client at the moment.=0A> >>>>>>>=0A> >>>>>>> Sandy=0A> >>>>>>>= =0A> >>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" wrote:= =0A> >>>>>>>=0A> >>>>>>>> I am considering scanning a snapshot instead of t= he table. I=0A> >>>> believe=0A> >>>>>>> this=0A> >>>>>>>> is what the Expo= rtSnapshot class does. If I could use the scanning=0A> >>>>>>> code=0A> >>>= >>>>> from ExportSnapshot then I will be able to scan the HDFS files=0A> >>= >>>>> directly=0A> >>>>>>>> and bypass the regionservers. This could potent= ially give me a=0A> huge=0A> >>>>>>> boost=0A> >>>>>>>> in performance for = full table scans. However, it doesn't really=0A> >>>>>>> address=0A> >>>>>>= >> the poor scan performance against a table.=0A> >>>>>>>=0A> >>>>>>>=0A> >= >>>>=0A> >>>>=0A> >=0A>=0A>=0A