hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Nguyen <andrew-lists-hb...@ucsfcti.org>
Subject Re: Modeling column families
Date Sat, 24 Apr 2010 20:10:18 GMT
Ryan,

Exactly, eventually, we will be storing data continuously on N beds in the ICU.  So, if it's
waveform data, it's probably going to be 125 Hz which is about 3.9 billion points per bed,
times N beds.  I've been trying to find out what sort of search terms to use to dive deeper
and "compound keys" with respect to NoSQL solutions.  

You mention tall tables - this sounds consistent with what Erik and Andrey have said.  Given
that, just to clarify my understanding, I'm probably looking at a single table with only one
column (the value, which Andrey names as "series"???) and billiions of rows, right?

That said, the decision to break up the values into multiple column families is just a function
of performance and how I want the data physically stored.  Are there any other major points
to consider for determining what column families to have?  (I made this conclusion from your
hbase-nosql presentation on slideshare.)

Thanks all!

--Andrew

On Apr 24, 2010, at 12:59 PM, Ryan Rawson wrote:

> For example if you are storing timeseries data for a monitoring
> system, you might want to store it by row, since the number of points
> for a single system might be arbitrarily large (think: 2 years+ of
> data). In this case if the expected data set size per row is larger
> than what a single machine could conceivably store, Cassandra would
> not work for you in this case (since each row must be stored on a
> single (er N) node(s)).


Mime
View raw message