hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: ResultScanner performance
Date Wed, 27 Aug 2014 17:20:24 GMT
Hi,

The reason we cannot close the ResultScanner (or issue a multi-get), is
that we have wide rows with many columns, and we want to iterate over them
rather than get all the columns at once.

There's a special but common case that for each row we only need the first
column. Is there a better way to do this than multiple scans + take(1)?

Jianshi



On Wed, Aug 27, 2014 at 12:44 PM, Dai, Kevin <yundai@ebay.com> wrote:

> Hi, Ted
>
> I think you are right. But we must hold the ResultScanner for a while. So
> is there any way to reduce the performance loss? Or is there any way to
> share the connection?
>
> Best regards,
> Kevin.
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: 2014年8月27日 11:36
> To: user@hbase.apache.org
> Subject: Re: ResultScanner performance
>
> Keeping many ResultScanners open at the same time is not good for
> performance.
>
> Please see:
> http://hbase.apache.org/book.html#perf.hbase.client.scannerclose
>
> After fetching results from ResultScanner, you should close it ASAP.
>
> Cheers
>
>
> On Tue, Aug 26, 2014 at 8:18 PM, Dai, Kevin <yundai@ebay.com> wrote:
>
> > Hi, Ted
> >
> > We have a cluster of 48 machines and at least 100T data(which is still
> > increasing).
> > The problem is that we have a lot of row keys (about tens of thousands
> > ) to query in the meantime and we don't fetch all the data at once,
> > instead we fetch them when needed, so we may hold tens of thousands
> > ResultScanner in the meantime.
> > I want to know whether it will hurt the performance and network
> > resources and if so, is there any way to solve it?
> >
> > Best regards,
> > Kevin.
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: 2014年8月26日 16:49
> > To: user@hbase.apache.org
> > Cc: user@hbase.apache.org; Huang, Jianshi
> > Subject: Re: ResultScanner performance
> >
> > Can you give a bit more detail ?
> > What size is the cluster / dataset ?
> > What problem are you solving ?
> > Would using coprocessor help reduce the usage of ResultScanner ?
> >
> > Cheers
> >
> > On Aug 26, 2014, at 12:13 AM, "Dai, Kevin" <yundai@ebay.com> wrote:
> >
> > > Hi, everyone
> > >
> > > My application will hold tens of thousands of ResultScanner to get
> Data.
> > Will it hurt the performance and network resources?
> > > If so, is there any way to solve it?
> > > Thanks,
> > > Kevin.
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message