hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Advices for HTable schema
Date Tue, 03 Jul 2012 11:57:31 GMT

You're over thinking this. 

Take a step back and remember that you can store anything you want as a byte stream in a column.


So you have a record that could be a text blob. Store it in one column. Use JSON to define
its structure and fields. 

The only thing that makes it difficult is that you will need to pull out everything just to
insert or update something.
So then maybe segment your data in to logical blocks. Like a column that stores the physical
attributes of the person. 
Another column that stores the list of addresses for the person.
Another column that stores the list of aliases used by the person. 

Don't think in relational terms. HBase isn't relational and ER is not the best way to model
in a NoSQL database. 
Think IMS/COBOL (mainframe) or Dick Pick's Revelation's OS. 

The only relationships in HBase are weak relationships between tables. 
Column Families currently have some nasty side effects that you may want to consider how you
apply them. 

Think in terms of records. 

Look at storing data using Avro. 

On Jul 2, 2012, at 8:56 PM, Jean-Marc Spaggiari wrote:

> 2012/7/2, Amandeep Khurana <amansk@gmail.com>:
>>> Here are the 2 options now. Both with a new table.
>>> 1) I store the key "personID" and a:a1 to a:an for the addresses.
>>> 2) I store the key "personID" + "address
>>> In both I will have the same amount of data. In #1 total size will be
>>> smaller since the key will be stored only once.
>> The size will be the same. The underlying HFile will store 1 row per cell
>> and the number of cells in both cases is the same.
>> However, the first approach with multiple columns for addresses needs you to
>> keep track of the number and makes updates, deletes, additions complicated
>> as I highlighted earlier. The second option with putting both things in the
>> key makes life much easier.
>> If the data is primarily being accessed independently, I'd go with option 2.
> Oh! I see! My misunderstanding comes from from my lack of HBase
> knowledge/reflex. I forgot it was storing the data that way. So I
> think I will most probably give a try to this 2nd option! Thanks for
> sharing your ideas all over the day.
> JM

View raw message