hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: hbase key design to efficient query on base of 2 or more column
Date Mon, 19 May 2014 16:06:03 GMT
If you use Phoenix, queries would leverage our Skip Scan:

Assuming a row key made up of a low cardinality first value (like a byte
representing an enum), followed by a high cardinality second value (like a
date/time value) you'd get a large benefit from the skip scan when you're
only looking a small sliver of your time range.

Another option would be to create a secondary index over your second+first
column: http://phoenix.incubator.apache.org/secondary_indexing.html


On May 19, 2014, at 6:47 AM, Shushant Arora <shushantarora09@gmail.com>

Ok..but what if I have 2 multivalue dimensions on which I have to analyse
no of users. Say Category can have 50 values and another dimension is
country of user(say 100+ values). I need weekly count on category and
country + I need overall distinct user count on category and country.

How to achieve this in Hbase.

On Mon, May 19, 2014 at 3:11 PM, Michael Segel <michael_segel@hotmail.com

The point is that choosing a field that has a small finite set of values

is not a good candidate for indexing using an inverted table or b-tree etc …

I’d say that you’re actually going to be better off using a scan with a

start and stop row, then doing the counts on the client side.

So as you get back your result set… you process the data. (Either in a M/R

job or single client thread.)


On May 19, 2014, at 8:48 AM, Shushant Arora <shushantarora09@gmail.com>


I cannot apply server side filter.

2nd requirement is not just get users with supreme category rather

distribution of users category wise.

1.How many of supreme , how many of normal and how many of medium till


On Mon, May 19, 2014 at 12:58 PM, Michael Segel



BAD BOY. This isn’t a good idea for secondary index.

You have a row key (primary index) which is time.

The secondary is a filter… with 3 choices.

HINT: Do you really want a secondary index based on a field that only


3 choices for a value?

What are they teaching in school these days?

How about applying a server side filter?  ;-)

On May 18, 2014, at 12:33 PM, John Hancock <jhancock1975@gmail.com>



Here's one idea, there might be better ways.

Take a look at phoenix it supports secondary indexing:



On Sat, May 17, 2014 at 8:34 AM, Shushant Arora



I have a requirement to query my data base on date and user category.

User category can be Supreme,Normal,Medium.

I want to query how many new users are there in my table from date


(2014-01-01) to (2014-05-16) category wise.

Another requirement is to query how many users of Supreme category are

there in my table Broken down wise month in which they came.

What should be my key

1.If i take key as combination of date#category. I cannot query based



2.If I take key as category#date I cannot query based on date.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message