hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: learning hbase - schema design advice
Date Thu, 21 Jan 2010 16:17:35 GMT
On Thu, Jan 21, 2010 at 1:09 AM, Dan Washusen <dan@reactive.org> wrote:
> Have you read the bigtable paper linked off the front page of HBase?  It
> does a good job of explaining the concepts.  Basically it's a distributed
> sorted map (think java.util.NavigableMap but split over many machines).  If
> you know the key of the row you are looking for HBase can fetch it very
> quickly.  If you don't know the key you'll have to resort to scanning all
> the rows to find the data you are interested in (just like a SQL query that
> can't take advantage of an index)...
>
> Do the queries need to immediately reflect any writes or is it sufficient
> for them to become eventually consistent?  If you can live with eventual
> consistency then you could write some map reduce jobs that duplicate a
> master table into reporting tables (like you would for data
> warehousing/reporting on a RDMS).
>
> I'm sure some of the more experienced users will have more insight but that
> might get you started...
>
> Cheers,
> Dan
>
> p.s. bold text doesn't seem to come through the mailing list...
>
> 2010/1/21 canucks <anhlon@gmail.com>
>
>>
>> Hi,
>>
>> i'm pretty interested in learning hbase.  what i want to do is store
>> financial data for analytical/graphing/displaying purposes.  there hundreds
>> of millions of rows and of course, i want fast response when retrieving the
>> data.
>>
>> if i were to do it in a RDBMS it would be
>> REPORT, MARKET, OPERATING_DATE, OPERATING_INTERVAL,     HOUR_ENDING
>> VALUE
>> where the bolded column name are PK.  if i were to store this in hbase
>> would
>> it look like this?
>>
>> REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{
>>        VALUE: 92.29
>> }
>>
>> so that i can do queries like below:
>> - give me all reports with the name of "ABC"
>> - give me all the values where OPERATING_DATE is from jan-01-2010 to
>> jan-10-2010
>> - give me all the values where OPERATING_DATE is from jan-01-2010 to
>> jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variations
>> thereof)
>>
>> in short, is hbase the wrong way to go about it or would it yield better
>> performance?  also, you folks happen to know any good links/articles on
>> hbase table & schema?
>>
>> thanks
>> --
>> View this message in context:
>> http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27252203.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
>
I went looking for a paper "how to convert my RDBMS mindset to a
key-value store midset" Here is something that got me started.

http://s-expressions.com/2009/03/08/hbase-on-designing-schemas-for-column-oriented-data-stores/

Mime
View raw message