hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shengjie Min <kelvin....@gmail.com>
Subject Re: Hbase Scan - number of columns make the query performance way different
Date Thu, 13 Sep 2012 14:35:25 GMT
In my case, I am not feeding hbase result to mapred, it's just pure hbase
scan, returning all columns vs two columns makes huge difference to me.

On 13 September 2012 15:29, Doug Meil <doug.meil@explorysmedical.com> wrote:

>
> Hi there, I don't know the specifics of your environment, but ...
>
> http://hbase.apache.org/book.html#perf.reading
> 11.8.2. Scan Attribute Selection
>
>
> Š describes paying attention to the number of columns you are returning,
> particularly when using HBase as a MR source.  In short, returning only
> the columns you need means you are reducing the data transferred between
> the RS and the client and the number of KV's evaluated in the RS, etc.
>
>
>
>
> On 9/13/12 10:12 AM, "Shengjie Min" <kelvin.msj@gmail.com> wrote:
>
> >Hi,
> >
> >I found an interesting difference between hbase scan query.
> >
> >I have a hbase table which has a lot of columns in a single column family.
> >eg. let's say I have a users table, then userid, username, email .... etc
> >etc 15 fields all together are in the single columnFamily.
> >
> >if you are familiar with RDBMS,
> >
> >query 1: select * from users
> >vs
> >query 2: select userid, username from users
> >
> >in mysql, these two has a difference, the query 2 will be obviously
> >faster,
> >but two queries won't give you a huge difference from performance
> >perspective.
> >
> >In Hbase, I noticed that:
> >
> >query 3: scan 'users',   // this is basically return me all 15 fields
> >vs
> >query 4: scan 'users', {COLUMNS=>['cf:userid','cf:username']}    // this
> >is
> >return me only two fields: userid , username
> >
> >query 3 here takes way longer than query 4, Given a big data set. In my
> >test, I have around 1,000,000 user records. You are talking about query 3
> >-
> >100 secs VS query 4 - a few secs.
> >
> >
> >Can anybody explain to me, why the width of the resultset in HBASE can
> >impact the performance that much?
> >
> >
> >Shengjie Min
>
>
>


-- 
All the best,
Shengjie Min

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message