hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Scan vs Get
Date Wed, 20 May 2015 13:01:55 GMT
Ok. I found a clean way to improve that a lot without going with the
filter. I will open a JIRA and push a fix.

The idea is to set the caching to the maximum of LIMIT, so we don't read
the entire table before returning to the shell. Also, we have to change
where we do the test.

anyway. JIRA 13721 is opened, I wlil push something there today.

Thanks,

JM

2015-05-19 23:51 GMT-04:00 Ted Yu <yuzhihong@gmail.com>:

> For PageFilter :
>
>  * Implementation of Filter interface that limits results to a specific
> page
>
>  * size. It terminates scanning once the number of filter-passed rows is >
>
>  * the given page size.
>
> In your case, what should be the page size - 0 ?
>
> Cheers
>
> On Tue, May 19, 2015 at 8:30 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Oh, I see! So basically we do a full table scan because it never returns
> a
> > 2nd row, so we never reach that break and we exit only when we reach the
> > end of the table. Therefore the same performances without the limit
> > parameter...
> >
> > Should we then try to add a filter like PageFilter to the scan if we
> have a
> > LIMIT? At least that might avoid a full scan?
> >
> > 2015-05-19 23:14 GMT-04:00 Matteo Bertozzi <theo.bertozzi@gmail.com>:
> >
> > > Take a look at table.rb _scan_internal()
> > > LIMIT is not passed to the server, so you fetch more rows
> > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-shell/src/main/ruby/hbase/table.rb#L495
> > >
> > > Matteo
> > >
> > >
> > > On Tue, May 19, 2015 at 8:11 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > I tried to run scan/get/scan/get many times, and always the same
> > pattern.
> > > > You can remove the "LIMIT => 1" parameter and you will get the same
> > > > performances.
> > > >
> > > > Scan and get without the QC returns in very similar time. 191ms for
> > one,
> > > > 194ms for the other one.
> > > >
> > > > 2015-05-19 23:02 GMT-04:00 Ted Yu <yuzhihong@gmail.com>:
> > > >
> > > > > J-M:
> > > > > How many times did you try the pair of queries ?
> > > > >
> > > > > Since scan was run first, this would give the get query some
> > advantage,
> > > > > right ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, May 19, 2015 at 7:34 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Are not Scan and Gets supposed to be almost as fast?
> > > > > >
> > > > > > I have a pretty small table with 65K lines, few columns
> (hundred?)
> > > > trying
> > > > > > to go a get and a scan.
> > > > > >
> > > > > > hbase(main):009:0> scan 'sensors', { COLUMNS =>
> > > > > > ['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], STARTROW => '000a',
> > LIMIT
> > > > =>
> > > > > 1
> > > > > > }
> > > > > > ROW
> > > > > > COLUMN+CELL
> > > > > >
> > > > > >  000a
> > > > > > column=v:f92acb5b-079a-42bc-913a-657f270a3dc1,
> > > timestamp=1432088038576,
> > > > > >
> > > > >
> > > >
> > >
> >
> value=\x08000aHf92acb5b-079a-42bc-913a-657f270a3dc1\x0EFAILURE\x0CNE-858\x
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 140-0000-000\x02\x96\x01SXOAXTPSIUFPPNUCIEVQGCIZHCEJBKGWINHKIHFRHWHNATAHAHQBFRAYLOAMQEGKLNZIFM
> > > > > > 000a
> > > > > > 1 row(s) in 12.6720 seconds
> > > > > >
> > > > > > hbase(main):010:0> get 'sensors', '000a', {COLUMN =>
> > > > > > 'v:f92acb5b-079a-42bc-913a-657f270a3dc1'}
> > > > > > COLUMN
> > > > > > CELL
> > > > > >
> > > > > >  v:f92acb5b-079a-42bc-913a-657f270a3dc1
> > > > > timestamp=1432088038576,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> value=\x08000aHf92acb5b-079a-42bc-913a-657f270a3dc1\x0EFAILURE\x0CNE-858\x140-0000-000\x02\x96\x01SXOAXTPSIUFPPNUCIEVQGCI
> > > > > >
> > > > > > ZHCEJBKGWINHKIHFRHWHNATAHAHQBFRAYLOAMQEGKLNZIFM
> > > > > > 000a
> > > > > >
> > > > > > 1 row(s) in 0.0280 seconds
> > > > > >
> > > > > >
> > > > > > They both return the same result. However, the get returns in
> 28ms
> > > > while
> > > > > > the scan returns in 12672ms.
> > > > > >
> > > > > > How come can the scan be that slow? Is it normal? If I remove
the
> > QC
> > > > from
> > > > > > the scan, then it takes only 250ms to return all the columns.
I
> > think
> > > > > > something is not correct.
> > > > > >
> > > > > > I'm running on 1.0.0-cdh5.4.0 so I guess it's the same for
> 1.0.x...
> > > > > >
> > > > > > JM
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message