hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Scan vs Get
Date Wed, 20 May 2015 03:30:09 GMT
Oh, I see! So basically we do a full table scan because it never returns a
2nd row, so we never reach that break and we exit only when we reach the
end of the table. Therefore the same performances without the limit
parameter...

Should we then try to add a filter like PageFilter to the scan if we have a
LIMIT? At least that might avoid a full scan?

2015-05-19 23:14 GMT-04:00 Matteo Bertozzi <theo.bertozzi@gmail.com>:

> Take a look at table.rb _scan_internal()
> LIMIT is not passed to the server, so you fetch more rows
>
> https://github.com/apache/hbase/blob/master/hbase-shell/src/main/ruby/hbase/table.rb#L495
>
> Matteo
>
>
> On Tue, May 19, 2015 at 8:11 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > I tried to run scan/get/scan/get many times, and always the same pattern.
> > You can remove the "LIMIT => 1" parameter and you will get the same
> > performances.
> >
> > Scan and get without the QC returns in very similar time. 191ms for one,
> > 194ms for the other one.
> >
> > 2015-05-19 23:02 GMT-04:00 Ted Yu <yuzhihong@gmail.com>:
> >
> > > J-M:
> > > How many times did you try the pair of queries ?
> > >
> > > Since scan was run first, this would give the get query some advantage,
> > > right ?
> > >
> > > Cheers
> > >
> > > On Tue, May 19, 2015 at 7:34 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Are not Scan and Gets supposed to be almost as fast?
> > > >
> > > > I have a pretty small table with 65K lines, few columns (hundred?)
> > trying
> > > > to go a get and a scan.
> > > >
> > > > hbase(main):009:0> scan 'sensors', { COLUMNS =>
> > > > ['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], STARTROW => '000a', LIMIT
> > =>
> > > 1
> > > > }
> > > > ROW
> > > > COLUMN+CELL
> > > >
> > > >  000a
> > > > column=v:f92acb5b-079a-42bc-913a-657f270a3dc1,
> timestamp=1432088038576,
> > > >
> > >
> >
> value=\x08000aHf92acb5b-079a-42bc-913a-657f270a3dc1\x0EFAILURE\x0CNE-858\x
> > > >
> > > >
> > > >
> > >
> >
> 140-0000-000\x02\x96\x01SXOAXTPSIUFPPNUCIEVQGCIZHCEJBKGWINHKIHFRHWHNATAHAHQBFRAYLOAMQEGKLNZIFM
> > > > 000a
> > > > 1 row(s) in 12.6720 seconds
> > > >
> > > > hbase(main):010:0> get 'sensors', '000a', {COLUMN =>
> > > > 'v:f92acb5b-079a-42bc-913a-657f270a3dc1'}
> > > > COLUMN
> > > > CELL
> > > >
> > > >  v:f92acb5b-079a-42bc-913a-657f270a3dc1
> > > timestamp=1432088038576,
> > > >
> > > >
> > >
> >
> value=\x08000aHf92acb5b-079a-42bc-913a-657f270a3dc1\x0EFAILURE\x0CNE-858\x140-0000-000\x02\x96\x01SXOAXTPSIUFPPNUCIEVQGCI
> > > >
> > > > ZHCEJBKGWINHKIHFRHWHNATAHAHQBFRAYLOAMQEGKLNZIFM
> > > > 000a
> > > >
> > > > 1 row(s) in 0.0280 seconds
> > > >
> > > >
> > > > They both return the same result. However, the get returns in 28ms
> > while
> > > > the scan returns in 12672ms.
> > > >
> > > > How come can the scan be that slow? Is it normal? If I remove the QC
> > from
> > > > the scan, then it takes only 250ms to return all the columns. I think
> > > > something is not correct.
> > > >
> > > > I'm running on 1.0.0-cdh5.4.0 so I guess it's the same for 1.0.x...
> > > >
> > > > JM
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message