hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rendon, Carlos (KBB)" <CRen...@kbb.com>
Subject RE: Help with row and column design
Date Tue, 29 Apr 2014 23:09:23 GMT
I've created a similar system using a rowkey like: (hash of date) - date
The downside is it still has a hotspot when inserting, but when reading a range of time it
does not. My use case was geared towards speeding up lots of reads.

Column qualifiers are just the collection of items you are aggregating on. Values are increments.
In your case qualifiers might look like

c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m, c:italy:sex:f, c:italy, 

Basically any combination of things you care about. This has the downside that you have to
determine what filters are available up front and not after the fact. The upside is querying
should be fast.

Computing counts over time is a batch of gets, which you can compute using the list of dates/times
that you care about.  For each qualifier you would sum across all of the row results.

Hope this gives you some ideas,

Carlos

-----Original Message-----
From: Software Dev [mailto:static.void.dev@gmail.com] 
Sent: Tuesday, April 29, 2014 3:51 PM
To: user@hbase.apache.org
Subject: Re: Help with row and column design

Someone mentioned in another post about hotspotting. I guess I could reverse the row keys
to prevent this?

On Tue, Apr 29, 2014 at 3:34 PM, Software Dev <static.void.dev@gmail.com> wrote:
> Hey all. I have some questions regarding row key and column design.
>
> We want to calculate some metrics based on our page views broken down 
> by hour, day, month and year. We also want this broken down country 
> and have the ability to filter by some other attributes such as the 
> sex of the user or whether or not the user is logged in..... Note 
> these will all be increments.
>
> So we have the initial row key design as
>
> YYYY - Row key for yearly totals
> YYYYMM - Row key for monthly totals
> YYYYMMDD - Row key for daily totals
> YYYYMMDDHH - Row key for hourly totals
>
> I think this may make sense as it will be easy to do a range scan over 
> a time period.
>
> Now for my column design. We were thinking along these lines.
>
> daily:US  - Daily counts for the US
> hourly:CA - Hourly counts for Canada
> ... and so on
>
> Now this seems like it would work but fails when we add in the 
> requirement of filtering results base on some other attributes. Say we 
> wanted to be able to filter based on sex (M or F) and/or filter based 
> on logged in status (Online or Offline) OR and/or filter based on some 
> other attribute OR perform no filtering at all. How would I go about 
> accomplishing this?
>
> Thanks for any input/pointers.
Mime
View raw message