hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: HBase design schema
Date Mon, 04 Apr 2011 22:30:17 GMT
OpenTSDB does an interesting thing where they put a primary key in front of
the date.  This limits some of the hot-spotting on inserts.  Each different
kind of query goes to a different machine as well.  The query balancing
won't be as good as the insert balancing since some queries are much more

On Mon, Apr 4, 2011 at 11:23 AM, Peter Haidinyak <phaidinyak@local.com>wrote:

> I've done almost the same thing at my work. Since I'm running on a VERY
> small number of servers (2), I pre-aggregate my data into tables in the
> format...
> [YYYY-MM-DD]|[Keyword]|[Referrer]  for the row key
> And then for the data column I store the hit count for that referrer. This
> approach has a problem during insert because having the date at the front of
> the key is usually goes to one server. The upside is that during a client
> scan you can set the start and end row, such as startRow =
> '2011-03-05|hospital| ' and the End Row as  endRow = '2011-03-05|hospital|~'
> this will return all of the referrers for the keyword hospital for the date
> of 2011-03-05.
> -Pete
> From: Miguel Costa [mailto:miguel-costa@telecom.pt]
> Sent: Monday, April 04, 2011 9:12 AM
> To: user@hbase.apache.org
> Subject: HBase design schema
> Hi,
> I need some help to a schema design on HBase.
> I have 5 dimensions (Time,Site,Referrer Keyword,Country).
> My row key is Site+Time.
> Now I want to answer some questions like what is the top Referrer by
> Keyword for a site on a Period of Time.
> Basically I want to cross all the dimensions that I have. And if I have 30
> dimensions?
> What is the best schema design.
> Please let me know  if this isn't the right mailing list.
> Thank you for your time.
> Miguel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message