hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Holstad <erikhols...@gmail.com>
Subject Re: Time-series schema
Date Fri, 29 Oct 2010 14:39:33 GMT
Hey Brian!
One thing that you could do to accommodate the second query is to do another
Either setting up an index that points you back to the original row, or just
putting the data
in there a second time with the specific identifier as the row key and then
the timestamps
as column.


On Fri, Oct 29, 2010 at 2:10 AM, Brian O'Kennedy <brokenn@gmail.com> wrote:

> Hi,
> I apologise if this has been asked a million times, but after some
> searching
> I'm still not sure if this is a good idea. I've got my local (currently
> standalone) server running, Thrift bindings etc and have started playing
> with schemas.
> I'd like to store a large amount of numeric time-series data using
> HBase. The data can be visualised as a 2d array.
> Row-axis is timestamp (YYYYMMDD_Milliseconds) (between 1 and 100 million
> rows per day)
> Column axis is a numeric identifier (in the range of about 20 000 unique
> ids)
> Each cell of this array is a small number of values representing some
> information for this identifier at this timestamp.
> The array is very sparse, some identifiers will only have one entry per
> day,
> some will have millions. I thought HBase might be a  good fit due to the
> scaling (I've got many terabytes of data to store) and the built-in
> versioning of cells. Occasionally I need to overwrite previous cell values,
> but always keep a complete history of previous values to produce
> 'point-in-time' views of the dataset.
> My first HBase schema was along the lines of having an row per timestamp:
>  YYYYMMDD_Milliseconds containing a column family for the identifiers, with
> values stored in there.
> This gives me nice and fast lookup by timestamp, but does not work at all
> for looking up all values for a specific  identifier over all times. Going
> back to the 2d array description, I need to be able to slice along rows
> (timestamps) or columns (identifiers).
> Any tips as to how achieve something like this using HBase? Am I using the
> wrong tool for the job? Am I completely misunderstanding how this all
> works?
> Thanks,
>   Brian

Regards Erik

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message