hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tsuna <tsuna...@gmail.com>
Subject Re: Time-series schema
Date Fri, 29 Oct 2010 21:16:25 GMT
On Fri, Oct 29, 2010 at 8:20 AM, Brian O'Kennedy <brokenn@gmail.com> wrote:
> So, from your description below I believe I can come up with a design that
> does ONE of these two queries very well, but the other very badly. Is there
> a way to have the best of both without having to implement both separately?

It really depends on the specific details of the problem at hand, but
generally the answer is no.

> And if I do so, do I lose all ability to update this database in an atomic
> fashion? (ie, insert a bunch of new data for some timestamp)

With HBase you can only get atomicity on a per-row basis, not across
rows, let alone across tables[*].

You might be interested in a distributed time series database system
that we, at StumbleUpon, are going to open-source in a few days:
http://opentsdb.net/ –  However, this system doesn't support
sub-second precision right now, and it sounds like you'd need that.
But maybe you can hack it to use a few extra bytes per timestamp and
store timestamps with millisecond precisions.

In OpenTSDB, the row key is a composite of 3 things: (timestamp,
metric name, tags).  All read queries are done with scanners, as we
always know at least the start time and metric name.  Tags filtering
is done by the RegionServers with a server-side filter.

We use OpenTSDB as our main monitoring system here at StumbleUpon, and
we keep track of over 200 000 time series and store over 100M data
points per day in our main production cluster.

Hope that helps,

  [*] That's actually not entirely true.  You can, within one table,
use explicit row locks to make atomic changes across rows.  There are
a lot of implications to be aware of before starting to use explicit
row locks, so using them is not recommended.

Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

View raw message