hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Washusen <...@reactive.org>
Subject Re: learning hbase - schema design advice
Date Thu, 21 Jan 2010 06:09:29 GMT
Have you read the bigtable paper linked off the front page of HBase?  It
does a good job of explaining the concepts.  Basically it's a distributed
sorted map (think java.util.NavigableMap but split over many machines).  If
you know the key of the row you are looking for HBase can fetch it very
quickly.  If you don't know the key you'll have to resort to scanning all
the rows to find the data you are interested in (just like a SQL query that
can't take advantage of an index)...

Do the queries need to immediately reflect any writes or is it sufficient
for them to become eventually consistent?  If you can live with eventual
consistency then you could write some map reduce jobs that duplicate a
master table into reporting tables (like you would for data
warehousing/reporting on a RDMS).

I'm sure some of the more experienced users will have more insight but that
might get you started...

Cheers,
Dan

p.s. bold text doesn't seem to come through the mailing list...

2010/1/21 canucks <anhlon@gmail.com>

>
> Hi,
>
> i'm pretty interested in learning hbase.  what i want to do is store
> financial data for analytical/graphing/displaying purposes.  there hundreds
> of millions of rows and of course, i want fast response when retrieving the
> data.
>
> if i were to do it in a RDBMS it would be
> REPORT, MARKET, OPERATING_DATE, OPERATING_INTERVAL,     HOUR_ENDING
> VALUE
> where the bolded column name are PK.  if i were to store this in hbase
> would
> it look like this?
>
> REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{
>        VALUE: 92.29
> }
>
> so that i can do queries like below:
> - give me all reports with the name of "ABC"
> - give me all the values where OPERATING_DATE is from jan-01-2010 to
> jan-10-2010
> - give me all the values where OPERATING_DATE is from jan-01-2010 to
> jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variations
> thereof)
>
> in short, is hbase the wrong way to go about it or would it yield better
> performance?  also, you folks happen to know any good links/articles on
> hbase table & schema?
>
> thanks
> --
> View this message in context:
> http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27252203.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message