hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Hbase Scan - number of columns make the query performance way different
Date Thu, 13 Sep 2012 14:29:47 GMT

Hi there, I don't know the specifics of your environment, but ...

11.8.2. Scan Attribute Selection

Š describes paying attention to the number of columns you are returning,
particularly when using HBase as a MR source.  In short, returning only
the columns you need means you are reducing the data transferred between
the RS and the client and the number of KV's evaluated in the RS, etc.

On 9/13/12 10:12 AM, "Shengjie Min" <kelvin.msj@gmail.com> wrote:

>I found an interesting difference between hbase scan query.
>I have a hbase table which has a lot of columns in a single column family.
>eg. let's say I have a users table, then userid, username, email .... etc
>etc 15 fields all together are in the single columnFamily.
>if you are familiar with RDBMS,
>query 1: select * from users
>query 2: select userid, username from users
>in mysql, these two has a difference, the query 2 will be obviously
>but two queries won't give you a huge difference from performance
>In Hbase, I noticed that:
>query 3: scan 'users',   // this is basically return me all 15 fields
>query 4: scan 'users', {COLUMNS=>['cf:userid','cf:username']}    // this
>return me only two fields: userid , username
>query 3 here takes way longer than query 4, Given a big data set. In my
>test, I have around 1,000,000 user records. You are talking about query 3
>100 secs VS query 4 - a few secs.
>Can anybody explain to me, why the width of the resultset in HBASE can
>impact the performance that much?
>Shengjie Min

View raw message