hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zsongbo <zson...@gmail.com>
Subject Re: financial time series database
Date Wed, 01 Apr 2009 03:09:08 GMT
If the rowkey is date/time and the data is original sequential by date/time,
when load/insert data into table, only one region (the one node) is active
to receive new data. The load performance will be bad.

On Wed, Apr 1, 2009 at 11:08 AM, zsongbo <zsongbo@gmail.com> wrote:

> If the rowkey is date/time and the data is original sequential by
> date/time, when load/insert data into table, only one region (the one
> node) is active to receive new data. The load performance will be pool.
>
>
> On Wed, Apr 1, 2009 at 10:25 AM, Bradford Cross <
> bradford.n.cross@gmail.com> wrote:
>
>> Greetings,
>>
>> I am prototyping a financial time series database on top of HBase and
>> trying
>> to head my head around what a good design would look like.
>>
>> As I understand it, I have rows, column families, columns and cells.
>>
>> Since the only think that Hbase really "indexes" is row keys, it seems
>> natural in a way to represent the rowkeys as the date/time.
>>
>> As a simple example:
>>
>> Bar data:
>>
>> {
>>   "2009/1/17" : {
>>     "open":"100",
>>     "high":"102",
>>     "low":"99",
>>     "close":"101"
>>     "volume":"1000256"
>>   }
>> }
>>
>>
>> Quote data:
>>
>> {
>>   "2009/1/17:11:23:04" : {
>>     "bid":"100.01",
>>     "ask":"100.02",
>>     "bidsize":"10000",
>>     "asksize":"100200"
>>   }
>> }
>>
>> But there are many other issues to think about.
>>
>> In financial time series data we have small amounts of data within each
>> "observation" and we can have lots of observations.  We can have millions
>> of
>> observations per time series (f.ex. all historical trade and quote date
>> for
>> a particular stock since 1993)across hundreds of thousands of individual
>> instruments (f.ex. across all stocks that have traded since 1993.)
>>
>> The write patterns fit HBase nicely, because it is a write once and append
>> pattern.  This is followed by loads of offline processes for simulating
>> trading models and such.  These query patterns look like "all quotes for
>> all
>> stocks between the dates of 1/1/996 and 12/31/2008."  So the querying is
>> typically across a date range, and we can further filter the query by
>> instrument types.
>>
>> So I am not sure what makes sense for efficiency because I do not
>> understand
>> HBase well enough yet.
>>
>>  What kinds of mixes of rows, column families, and columns should I be
>> thinking about?
>>
>> Does my simplistic approach make any sense?  That would mean each row is a
>> key-value pair where the key is is the date/time and the value is the
>> "observation."  I suppose this leads to a "table per time series" model.
>> Does that make sense or is there overhead to having lots of tables?
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message