hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: When to expand vertically vs. horizontally in Hbase
Date Mon, 08 Jul 2013 15:27:47 GMT
Ian, 

You still want to stick to your relational modeling.  :-(

You need to play around more with hierarchical models to get a better appreciation. 

If you model as if you're working with a RDBMS then you will end up with a poor HBase table
design. 

In ERD models, you don't have the concept of a weak relationship. 
The weak relationship is that the model has no relationship between the entities. Its the
application that manages that. 

Imagine a reference or look up table that in the model has no association. Using our example
of an Order Entry system, its the application that hits the customer lookup table to capture
relevant information for the order.  That's why I refer to it as a weak association. 



On Jul 5, 2013, at 6:00 PM, Ian Varley <ivarley@salesforce.com> wrote:

> Sure. Maybe it's useful to talk about the functional aspect of relationships in models.
In an RDBMS, explicit relationship play a couple roles:
> 
> - foreign key constraints: don't allow a tuple in relation A to point to a row in relation
B that doesn't exist
> - join optimization - knowledge of how two relations are logically connected can help
perform joins in a more optimal way
> 
> HBase, of course, provides neither of these features out of the box, so there is no difference
between an implied (weakly coupled, to use your term) relationship and something stronger.

> 
> Where it gets interesting is in the kind of denormalization you're talking about, where
information that properly belongs to one entity is copied into another one for efficiency's
sake, or to get some kind of atomicity protection. Your scenario below is doing this (duplicating
customer info in the order records). 
> 
> To be fair, relational DBs also force this kind of behavior sometimes, again for efficiency
reasons (we've all done it). HBase just starts there. :)
> 
> Ian
> 
> On Jul 5, 2013, at 4:22 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:
> 
>> An entity is an entity. 
>> When you couple them you are saying that there's a relationship to them in the model.

>> 
>> What I am saying is that you can have an HBase model which is not a single table,
however when you look at your use case, you are querying data from a single table at a time.

>> 
>> Going back to the order entry system. You may have a customer table which maintains
all of the information about your customer yet you will also duplicate portions of the data
in to the order system.  You still have other entities such as your orders, pick slips, shipping
and invoices. There won't be a hard or strong relationship between the customer table and
the order table. 
>> 
>> When you go to your ERD tool, you wouldn't show a strong coupling of the data. 
>> 
>> Does that make sense? 
>> 
>> On Jul 5, 2013, at 1:56 PM, Ian Varley <ivarley@salesforce.com> wrote:
>> 
>>> Mike, what do you mean by "you can have entities, except that they are not coupled"?
You mean, they have no relationship to each other? Or the relationship is defined elsewhere
(e.g. application code)? The concept of "coupling" seems a little overloaded and not as concise
here as "relationship". Two tuples in a database can have a wide number of relationships to
each other; the kinds of relationships that are actively supported differs between a traditional
RDBMS and HBase, and proper HBase design requires understand these limitations precisely.
>>> 
>>> I'm not trying to be an ER<http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model>
apologist, there are a lot of ways in which it sucks. :) But if we want to evolve, we can't
just pretend there's no history here to build on.
>>> 
>>> Ian
>>> 
>>> On Jul 5, 2013, at 1:41 PM, Michael Segel wrote:
>>> 
>>> LOL...
>>> 
>>> Ian wrote:
>>> "But, something just occurred to me: just because your physical implementation
(HBase) doesn't support normalized entities and relationships doesn't mean your *problem*
doesn't have entities and relationships. :) An Author is one entity, a Title is another, and
a Genre is a third. Understanding how they interact is a prerequisite for translating into
a physical model that works well in HBase. (ERD modeling is not categorically the only way
to understand that, but I've yet to hear a credible alternative that doesn't boil down to
either ERD or "do it in your head").
>>> "
>>> 
>>> You can have entities, except that they are not coupled.
>>> 
>>> If you have a common key, then you may have a use for column families, it just
depends on your data and how you access your data.
>>> 
>>> Its not rocket science, but its a non-trivial matter. Not doing it right may
mean that you are not going to get the most out of your system.
>>> 
>>> 
>>> On Jul 5, 2013, at 1:26 PM, Ian Varley <ivarley@salesforce.com<mailto:ivarley@salesforce.com>>
wrote:
>>> 
>>> But, something just occurred to me: just because your physical implementation
(HBase) doesn't support normalized entities and relationships doesn't mean your *problem*
doesn't have entities and relationships. :) An Author is one entity, a Title is another, and
a Genre is a third. Understanding how they interact is a prerequisite for translating into
a physical model that works well in HBase. (ERD modeling is not categorically the only way
to understand that, but I've yet to hear a credible alternative that doesn't boil down to
either ERD or "do it in your head").
>>> 
>>> 
>> 
> 


Mime
View raw message