hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liam Slusser <lslus...@gmail.com>
Subject Re: Help with row and column design
Date Tue, 29 Apr 2014 23:15:44 GMT
Here is some links that helped me design my keys...

http://www.appfirst.com/blog/best-practices-for-managing-hbase-in-a-high-write-environment/
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
http://hbase.apache.org/book/rowkey.design.html
http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html

Some fun bed time reading..  :)

cheers,
liam



On Tue, Apr 29, 2014 at 3:51 PM, Software Dev <static.void.dev@gmail.com>wrote:

> Someone mentioned in another post about hotspotting. I guess I could
> reverse the row keys to prevent this?
>
> On Tue, Apr 29, 2014 at 3:34 PM, Software Dev <static.void.dev@gmail.com>
> wrote:
> > Hey all. I have some questions regarding row key and column design.
> >
> > We want to calculate some metrics based on our page views broken down
> > by hour, day, month and year. We also want this broken down country
> > and have the ability to filter by some other attributes such as the
> > sex of the user or whether or not the user is logged in..... Note
> > these will all be increments.
> >
> > So we have the initial row key design as
> >
> > YYYY - Row key for yearly totals
> > YYYYMM - Row key for monthly totals
> > YYYYMMDD - Row key for daily totals
> > YYYYMMDDHH - Row key for hourly totals
> >
> > I think this may make sense as it will be easy to do a range scan over
> > a time period.
> >
> > Now for my column design. We were thinking along these lines.
> >
> > daily:US  - Daily counts for the US
> > hourly:CA - Hourly counts for Canada
> > ... and so on
> >
> > Now this seems like it would work but fails when we add in the
> > requirement of filtering results base on some other attributes. Say we
> > wanted to be able to filter based on sex (M or F) and/or filter based
> > on logged in status (Online or Offline) OR and/or filter based on some
> > other attribute OR perform no filtering at all. How would I go about
> > accomplishing this?
> >
> > Thanks for any input/pointers.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message