Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <27252203.post@talk.nabble.com>
Date: Wed, 20 Jan 2010 19:11:41 -0800 (PST)
From: canucks <anhlon@gmail.com>
To: hbase-user@hadoop.apache.org
Subject: learning hbase - schema design advice
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi,

i'm pretty interested in learning hbase.  what i want to do is store
financial data for analytical/graphing/displaying purposes.  there hundreds
of millions of rows and of course, i want fast response when retrieving the
data.

if i were to do it in a RDBMS it would be
REPORT,	MARKET,	OPERATING_DATE,	OPERATING_INTERVAL,	HOUR_ENDING	VALUE
where the bolded column name are PK.  if i were to store this in hbase would
it look like this?

REPORT.MARKET.OPERATING_DATE.OPERATING_INTERVAL.HOUR_ENDING.TIMESTAMP{
	VALUE: 92.29
}

so that i can do queries like below:
- give me all reports with the name of "ABC"
- give me all the values where OPERATING_DATE is from jan-01-2010 to
jan-10-2010
- give me all the values where OPERATING_DATE is from jan-01-2010 to
jan-10-2010 and HOUR_ENDING is between 5 and 10 (or simply 5 or variations
thereof)

in short, is hbase the wrong way to go about it or would it yield better
performance?  also, you folks happen to know any good links/articles on
hbase table & schema?

thanks
-- 
View this message in context: http://old.nabble.com/learning-hbase---schema-design-advice-tp27252203p27252203.html
Sent from the HBase User mailing list archive at Nabble.com.