hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Nested data structures examples for HBase
Date Fri, 12 Sep 2014 07:22:31 GMT


Let me put it a different way… 

Think of a sales invoice. 

You can have columns for invoice_id, customer_id, customer_name, customer_billing_address
(Nested structure), customer_contact# (nested structure), ship_to (nested structure)… 
And that’s the header information. 

Add to that the actual invoice line items… (row#, SKU#, description, qty, unit_price, line_price,
tax-code) … [Note: this is also nested]

How do you have a single column family to handle all of that? 

Again, when you look at designs with respect to a real use case, you start to see where they
fall apart. 

If we take a long look at what HBase is, and is not, we can start to see how we would want
to model the data and how to better organize the data. 

I don’t want to morph this thread in to a more theoretical discussion on design, but this
isn’t a new thing. 
Informix had project Arrowhead back in the late 90’s that got killed when Janet Perna bought
them.  Had that project not been killed, the landscape would be very different. 
(And that’s again another story. ;-) 

But I digress. 

The point I’m trying to make is that when you start to look at the data, where you would
have a Master/Slave relationship in terms of the data, you can replace it with some sort of
array/list structure in a single column since everything is a blob.   (And again there are
areas where you can impose more constraints on hbase and make it either more in to a relational
model or in to a hierarchal model. and this would again be a different discussion.)



On Sep 10, 2014, at 10:25 PM, Wilm Schumacher <wilm.schumacher@cawoom.com> wrote:

> Am 10.09.2014 um 22:25 schrieb Michael Segel:
>> Ok, but here’s the thing… you extrapolate the design out… each column
>> with a subordinate record will get its own CF.
> I disagree. Not by the proposed design. You could do it with one CF.
>> Simple examples can go
>> very bad when you move to real life.
> I agree.
>> Again you need to look at hierarchical databases and not think in
>> terms of relational. To give you a really good example… look at a
>> point of sale system in Pick/Revelation/U2 …
>> You are great at finding a specific customer’s order and what they
>> ordered. You suck at telling me how many customers ordered that
>> widget  in red.  during the past month’s promotion. (You’ll need to
>> do a map/reduce for that. )
> correct, that's the downside of the suggestion. If you want to query
> something like that ("give all 'toplevel columns' that that have this
> and that!"), you would have to make a map reduce. Or you need something
> like an index. But that's a question only the thread owner can answer
> because we don't know what he's trying to accomplish. If there is a
> chance that he want to query something like that, my suggestion would be
> a bad plan.
> I think the thread owner has now 3 ideas how to do what he was asking
> for, with up and downsides. Now he has to decide what's the best plan
> for the future.
> Best wishes,
> Wilm

View raw message