Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: local policy)
References: 
 <CABALG=SwrQcryygGkrPLcBEohSN0QK5SXMGoOKyShecd93LgHw@mail.gmail.com>
	<CDD4C94A.285A5%prattrs@adobe.com>
	<1370480627.39897.YahooMailNeo@web140604.mail.bf1.yahoo.com>
	<FAF17FD1-0E0B-418E-9458-3C7E5BA9704E@gmail.com>
	<1372442213.11396.YahooMailNeo@web140603.mail.bf1.yahoo.com>
	<30DC2A0F-FB22-44BB-B97F-EDD417F813B9@gmail.com>
 <CALte62yD-4Utr_xyUsrLBmeNkt8qXXS2eJbKispc0XZEuUA5Fg@mail.gmail.com>
Message-ID: <1372676341.34654.YahooMailNeo@web140602.mail.bf1.yahoo.com>
Date: Mon, 1 Jul 2013 03:59:01 -0700 (PDT)
From: lars hofhansl <larsh@apache.org>
Reply-To: lars hofhansl <larsh@apache.org>
Subject: Re: Poor HBase map-reduce scan performance
To: "user@hbase.apache.org" <user@hbase.apache.org>
In-Reply-To: 
 <CALte62yD-4Utr_xyUsrLBmeNkt8qXXS2eJbKispc0XZEuUA5Fg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Absolutely.=0A=0A=0A=0A----- Original Message -----=0AFrom: Ted Yu <yuzhiho=
ng@gmail.com>=0ATo: user@hbase.apache.org=0ACc: =0ASent: Sunday, June 30, 2=
013 9:32 PM=0ASubject: Re: Poor HBase map-reduce scan performance=0A=0ALook=
ing at the tail of HBASE-8369, there were some comments which are yet=0Ato =
be addressed.=0A=0AI think trunk patch should be finalized before backporti=
ng.=0A=0ACheers=0A=0AOn Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <bryanck=
@gmail.com> wrote:=0A=0A> I'll attach my patch to HBASE-8369 tomorrow.=0A>=
=0A> On Jun 28, 2013, at 10:56 AM, lars hofhansl <larsh@apache.org> wrote:=
=0A>=0A> > If we can make a clean patch with minimal impact to existing cod=
e I=0A> would be supportive of a backport to 0.94.=0A> >=0A> > -- Lars=0A> =
>=0A> >=0A> >=0A> > ----- Original Message -----=0A> > From: Bryan Keller <=
bryanck@gmail.com>=0A> > To: user@hbase.apache.org; lars hofhansl <larsh@ap=
ache.org>=0A> > Cc:=0A> > Sent: Tuesday, June 25, 2013 1:56 AM=0A> > Subjec=
t: Re: Poor HBase map-reduce scan performance=0A> >=0A> > I tweaked Enis's =
snapshot input format and backported it to 0.94.6 and=0A> have snapshot sca=
nning functional on my system. Performance is dramatically=0A> better, as e=
xpected i suppose. I'm seeing about 3.6x faster performance vs=0A> TableInp=
utFormat. Also, HBase doesn't get bogged down during a scan as the=0A> regi=
onserver is being bypassed. I'm very excited by this. There are some=0A> is=
sues with file permissions and library dependencies but nothing that=0A> ca=
n't be worked out.=0A> >=0A> > On Jun 5, 2013, at 6:03 PM, lars hofhansl <l=
arsh@apache.org> wrote:=0A> >=0A> >> That's exactly the kind of pre-fetchin=
g I was investigating a bit ago=0A> (made a patch, but ran out of time).=0A=
> >> This pre-fetching is strictly client only, where the client keeps the=
=0A> server busy while it is processing the previous batch, but filling up =
a 2nd=0A> buffer.=0A> >>=0A> >>=0A> >> -- Lars=0A> >>=0A> >>=0A> >>=0A> >> =
________________________________=0A> >> From: Sandy Pratt <prattrs@adobe.co=
m>=0A> >> To: "user@hbase.apache.org" <user@hbase.apache.org>=0A> >> Sent: =
Wednesday, June 5, 2013 10:58 AM=0A> >> Subject: Re: Poor HBase map-reduce =
scan performance=0A> >>=0A> >>=0A> >> Yong,=0A> >>=0A> >> As a thought expe=
riment, imagine how it impacts the throughput of TCP to=0A> >> keep the win=
dow size at 1.=A0 That means there's only one packet in flight=0A> >> at a =
time, and total throughput is a fraction of what it could be.=0A> >>=0A> >>=
 That's effectively what happens with RPC.=A0 The server sends a batch,=0A>=
 then=0A> >> does nothing while it waits for the client to ask for more.=A0=
 During that=0A> >> time, the pipe between them is empty.=A0 Increasing the=
 batch size can=0A> help=0A> >> a bit, in essence creating a really huge pa=
cket, but the problem=0A> remains.=0A> >> There will always be stalls in th=
e pipe.=0A> >>=0A> >> What you want is for the window size to be large enou=
gh that the pipe is=0A> >> saturated.=A0 A streaming API accomplishes that =
by stuffing data down the=0A> >> network pipe as quickly as possible.=0A> >=
>=0A> >> Sandy=0A> >>=0A> >> On 6/5/13 7:55 AM, "yonghu" <yongyong313@gmail=
.com> wrote:=0A> >>=0A> >>> Can anyone explain why client + rpc + server wi=
ll decrease the=0A> performance=0A> >>> of scanning? I mean the Regionserve=
r and Tasktracker are the same node=0A> >>> when=0A> >>> you use MapReduce =
to scan the HBase table. So, in my understanding,=0A> there=0A> >>> will be=
 no rpc cost.=0A> >>>=0A> >>> Thanks!=0A> >>>=0A> >>> Yong=0A> >>>=0A> >>>=
=0A> >>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <prattrs@adobe.com>=
=0A> wrote:=0A> >>>=0A> >>>> https://issues.apache.org/jira/browse/HBASE-86=
91=0A> >>>>=0A> >>>>=0A> >>>> On 6/4/13 6:11 PM, "Sandy Pratt" <prattrs@ado=
be.com> wrote:=0A> >>>>=0A> >>>>> Haven't had a chance to write a JIRA yet,=
 but I thought I'd pop in=0A> here=0A> >>>>> with an update in the meantime=
.=0A> >>>>>=0A> >>>>> I tried a number of different approaches to eliminate=
 latency and=0A> >>>>> "bubbles" in the scan pipeline, and eventually arriv=
ed at adding a=0A> >>>>> streaming scan API to the region server, along wit=
h refactoring the=0A> >>>> scan=0A> >>>>> interface into an event-drive mes=
sage receiver interface.=A0 In so=0A> >>>> doing, I=0A> >>>>> was able to t=
ake scan speed on my cluster from 59,537 records/sec=0A> with=0A> >>>> the=
=0A> >>>>> classic scanner to 222,703 records per second with my new scan A=
PI.=0A> >>>>> Needless to say, I'm pleased ;)=0A> >>>>>=0A> >>>>> More deta=
ils forthcoming when I get a chance.=0A> >>>>>=0A> >>>>> Thanks,=0A> >>>>> =
Sandy=0A> >>>>>=0A> >>>>> On 5/23/13 3:47 PM, "Ted Yu" <yuzhihong@gmail.com=
> wrote:=0A> >>>>>=0A> >>>>>> Thanks for the update, Sandy.=0A> >>>>>>=0A> =
>>>>>> If you can open a JIRA and attach your producer / consumer scanner=
=0A> >>>> there,=0A> >>>>>> that would be great.=0A> >>>>>>=0A> >>>>>> On T=
hu, May 23, 2013 at 3:42 PM, Sandy Pratt <prattrs@adobe.com>=0A> >>>> wrote=
:=0A> >>>>>>=0A> >>>>>>> I wrote myself a Scanner wrapper that uses a produ=
cer/consumer=0A> >>>> queue to=0A> >>>>>>> keep the client fed with a full =
buffer as much as possible.=A0 When=0A> >>>>>>> scanning=0A> >>>>>>> my tab=
le with scanner caching at 100 records, I see about a 24%=0A> >>>> uplift=
=0A> >>>>>>> in=0A> >>>>>>> performance (~35k records/sec with the ClientSc=
anner and ~44k=0A> >>>>>>> records/sec=0A> >>>>>>> with my P/C scanner).=A0=
 However, when I set scanner caching to 5000,=0A> >>>>>>> it's=0A> >>>>>>> =
more of a wash compared to the standard ClientScanner: ~53k=0A> >>>> record=
s/sec=0A> >>>>>>> with the ClientScanner and ~60k records/sec with the P/C =
scanner.=0A> >>>>>>>=0A> >>>>>>> I'm not sure what to make of those results=
.=A0 I think next I'll shut=0A> >>>>>>> down=0A> >>>>>>> HBase and read the=
 HFiles directly, to see if there's a drop off in=0A> >>>>>>> performance b=
etween reading them directly vs. via the RegionServer.=0A> >>>>>>>=0A> >>>>=
>>> I still think that to really solve this there needs to be sliding=0A> >=
>>>>>> window=0A> >>>>>>> of records in flight between disk and RS, and bet=
ween RS and=0A> client.=0A> >>>>>>> I'm=0A> >>>>>>> thinking there's probab=
ly a single batch of records in flight=0A> >>>> between=0A> >>>>>>> RS=0A> =
>>>>>>> and client at the moment.=0A> >>>>>>>=0A> >>>>>>> Sandy=0A> >>>>>>>=
=0A> >>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <bryanck@gmail.com> wrote:=
=0A> >>>>>>>=0A> >>>>>>>> I am considering scanning a snapshot instead of t=
he table. I=0A> >>>> believe=0A> >>>>>>> this=0A> >>>>>>>> is what the Expo=
rtSnapshot class does. If I could use the scanning=0A> >>>>>>> code=0A> >>>=
>>>>> from ExportSnapshot then I will be able to scan the HDFS files=0A> >>=
>>>>> directly=0A> >>>>>>>> and bypass the regionservers. This could potent=
ially give me a=0A> huge=0A> >>>>>>> boost=0A> >>>>>>>> in performance for =
full table scans. However, it doesn't really=0A> >>>>>>> address=0A> >>>>>>=
>> the poor scan performance against a table.=0A> >>>>>>>=0A> >>>>>>>=0A> >=
>>>>=0A> >>>>=0A> >=0A>=0A>=0A