hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Hbase multiget vs scan with RowFilter
Date Thu, 05 Feb 2015 14:57:46 GMT
As Lars mentioned, for scan with RowFilter approach, set start and stop
rows properly so that the number of rows to scan is limited.

Probably you can play with both approaches using sample data to find out
which one is faster.


On Wed, Feb 4, 2015 at 10:07 PM, lars hofhansl <larsh@apache.org> wrote:

> It depends.
> A scan will always scan all rows between the passed start and stop rows
> (or all rows when none where passed). A Filter can filter rows out, but
> they will all be read in.A MultiGet does a seek (in a sense) for each Get.
> So if the set of Gets in the MultiGet is very small compared to the total
> number of rows you need to scan you'll be better off with that.If you can
> limit the set of rows to scan (i.e. there is a start and stop row not too
> far apart) a scan is faster.
> The numbers depend on many variables but Scan'ning a row in a larger set
> is probably 100-1000x faster than Get'ing a single row.
> -- Lars
>       From: alokob <alokob.be@gmail.com>
>  To: user@hbase.apache.org
>  Sent: Wednesday, February 4, 2015 8:53 PM
>  Subject: Re: Hbase multiget vs scan with RowFilter
> Thanks Ted for your reply.
> I was under impression that scan with RowFilter would give better
> performance as in case of multi-get each Get would be treated as an
> independent scan. Or you mean to say that in case scan it would be full
> table scan but in case of multi-get it would be return once we get the row
> without continuing to scan , so multi-get would be efficient.
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066p4068108.html
> Sent from the HBase User mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message