hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yves Langisch <y...@langisch.ch>
Subject Re: Schema design question
Date Mon, 18 Apr 2011 06:48:57 GMT
Yes, you're right. They have a row for each 10 minute period. Inside a row they work with offsets
in seconds within this 10 minute period. This leads to a maximum of 10*60 columns per row.
Normally you have less columns as you don't have a datapoint for each second.

So I wonder if the query performance could be improved with periods of 60 minutes leading
to 3600 columns max assuming that all columns are needed and no filtering is done? Basically
the question is if it's better to have a wide design (horizontal) rather than a vertical one
(many rows) for such a scenario?

On Apr 16, 2011, at 11:51 PM, Ted Dunning wrote:

> TsDB has more columns than it appears at first glance.  They store all of the observations
for a relatively long time interval in a single row.
> 
> You may have spotted that right off (I didn't).
> 
> On Sat, Apr 16, 2011 at 1:27 AM, Yves Langisch <yves@langisch.ch> wrote:
> As I'm about to plan a similar app I have studied the HBase schema of the opentsb project:
> 
> http://opentsdb.net/schema.html
> 
> The opentsb approach seems to have many rows instead of many columns. What is the better
schema design in terms of query performance? My experience so far is that a width schema with
many columns but less rows performs better. A 'horizontal' table scan seems to be better suited
for fast queries.
> 
> Yves
> 


Mime
View raw message