hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shengjie Min <kelvin....@gmail.com>
Subject Hbase Scan - number of columns make the query performance way different
Date Thu, 13 Sep 2012 14:12:48 GMT

I found an interesting difference between hbase scan query.

I have a hbase table which has a lot of columns in a single column family.
eg. let's say I have a users table, then userid, username, email .... etc
etc 15 fields all together are in the single columnFamily.

if you are familiar with RDBMS,

query 1: select * from users
query 2: select userid, username from users

in mysql, these two has a difference, the query 2 will be obviously faster,
but two queries won't give you a huge difference from performance

In Hbase, I noticed that:

query 3: scan 'users',   // this is basically return me all 15 fields
query 4: scan 'users', {COLUMNS=>['cf:userid','cf:username']}    // this is
return me only two fields: userid , username

query 3 here takes way longer than query 4, Given a big data set. In my
test, I have around 1,000,000 user records. You are talking about query 3 -
100 secs VS query 4 - a few secs.

Can anybody explain to me, why the width of the resultset in HBASE can
impact the performance that much?

Shengjie Min

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message