hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mete <efk...@gmail.com>
Subject Re: key design
Date Tue, 22 May 2012 13:50:26 GMT

I wanted  to use separate tables for each log type since they are
considerably big. Around 100gb per month, so it just seemed natural to put
them into different tables since i dont need to query them all together.

Thanks for the heads up

On Mon, May 21, 2012 at 6:12 PM, Ian Varley <ivarley@salesforce.com> wrote:

> Mete,
> Why separate tables per log type? Why not a single table with the key:
> <log type><date>
> That's roughly the approach used by OpenTSDB (with "metric id" instead of
> "log type", but same idea). OpenTSDB goes further by "bucketing" values
> into rows using a base timestamp in the row key and offset timestamps in
> the column qualifiers, for more efficiency.
> If you start the key with log type, you can do partial scans for a
> specific date, but only within a single log type; to scan across all log
> types, you'd need to do multiple scans (one per log type). If you have a
> fixed and relatively small number of log types (less than 20, say), this
> could still be the best approach, but if it's a very frequent operation to
> scan by time across all log types and you have a lot of log types, you
> might want to reconsider that.
> The case for using a hash as the start of the key is really just to avoid
> region server "hot spotting" (where, even though you have lots of machines,
> all your insert traffic is going to one of them because all inserts are
> happening "now" and only one region server contains the range that "now" is
> in). Salting or hashing a timestamp based key spreads that out so the load
> is evenly distributed; but it prevents you from doing linear scans over the
> time dimension. That's why OpenTSDB (and similar models) start the key with
> another value that "spreads" the data over all servers.
> Ian
> On May 21, 2012, at 7:56 AM, mete wrote:
> > Hello folks,
> >
> > i am trying to come up with a nice key design for storing logs in the
> > company. I am planning to index them  and store row key in the index for
> > random reads.
> >
> > I need to balance the writes equally between the R.S. and i could not
> > understand how opentsdb does that with prefixing the metric id. (i
> related
> > metric id with the log type) In my log storage case a log line just has a
> > type and a date and the rest of it is not really very useful information.
> >
> > So i think that i can create a table for every distinct log type and i
> need
> > a random salt to route to a different R.S. similar to this:
> > <salt>-<date>
> >
> > But with this approach i believe i will lose the ability to do effective
> > partial scans to a specific date. (if for some reason i need that) What
> do
> > you think? And for the salt approach do you use randomly generated salts
> or
> > hashes that actually mean something? (like the hash of the date)
> >
> > I am using random uuids at the moment but i am trying to find a better
> > approach, any feedback is welcome
> >
> > cheers
> > Mete

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message