hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/DataModel" by JeanDanielCryans
Date Fri, 11 Jul 2008 14:39:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by JeanDanielCryans:

New page:
 * [#intro Introduction]
 * [#overview Overview]

= Introduction =

The Bigtable data model and therefor the HBase data model too since it's a clone, is particularly
well adapted to data-intensive systems. Getting high scalability from your relational database
isn't done by simply adding more machines because its data model is based on a single-machine
architecture. For example, a JOIN between two tables is done in memory and does not take into
account the possibility that the data has to go over the wire. Companies who did propose relational
distributed databases had a lot of redesign to do and this why they have high licensing costs.
The other option is to use replication and when the slaves are overloaded with ''writes'',
the last option is to begin sharding the tables in sub-databases. At that point, data normalization
is a thing you only remember seeing in class which is why going with the data model presented
in this paper shouldn't bother  you at all.

= Overview =

To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], Map<byte[], Map<long,
byte[]>>>>. The first Map maps row keys to their ''column families''. The second
maps column families to their ''column keys''. The third one maps column keys'' to their ''timestamps''.
Finally, the last one maps the timestamps to a single value. The keys are typically strings,
the timestamp is a long and the value is an uninterpreted array of bytes. The 

row key+column key+timestamp -> value

View raw message