hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Haidinyak <phaidin...@local.com>
Subject RE: HBase design schema
Date Mon, 04 Apr 2011 18:23:52 GMT
I've done almost the same thing at my work. Since I'm running on a VERY small number of servers
(2), I pre-aggregate my data into tables in the format...

[YYYY-MM-DD]|[Keyword]|[Referrer]  for the row key

And then for the data column I store the hit count for that referrer. This approach has a
problem during insert because having the date at the front of the key is usually goes to one
server. The upside is that during a client scan you can set the start and end row, such as
startRow = '2011-03-05|hospital| ' and the End Row as  endRow = '2011-03-05|hospital|~' this
will return all of the referrers for the keyword hospital for the date of 2011-03-05.

YMMV

-Pete

From: Miguel Costa [mailto:miguel-costa@telecom.pt]
Sent: Monday, April 04, 2011 9:12 AM
To: user@hbase.apache.org
Subject: HBase design schema

Hi,

I need some help to a schema design on HBase.

I have 5 dimensions (Time,Site,Referrer Keyword,Country).
My row key is Site+Time.

Now I want to answer some questions like what is the top Referrer by Keyword for a site on
a Period of Time.
Basically I want to cross all the dimensions that I have. And if I have 30 dimensions?

What is the best schema design.

Please let me know  if this isn't the right mailing list.

Thank you for your time.

Miguel



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message