HBase does version cells.
But I saw something of interest:
>>> In my test, there are 10,000 customers, each customer has 600 orders and each
order has 10 columns. The tall table approach results in 6 mil rows of 10 columns. The wide
table approach results is 10,000 rows of 6,000 columns. I'm using hbase 0.8920100924 and
hadoop 0.20.2. I am adding the orders using a Put for each order, submitted in batches of
1000 as a list of Puts.
>>> Are there techniques to speed up inserts with the wide table approach that I
am perhaps overlooking?
Ok, so you have 10K by 600 by 10. So the 'tall' design has a row key of customer_id and Order_id
with 10 columns in a single column family.
So you get 6 million rows and 10 column puts.
Now if you do a 'wide' table...
Your row key is the 'customer_id' only. Each column is the order so you write one column for
each order and you have to figure out how you represent your columns in the order.
(An example... your order of 10 items is represented by a string with a 'special character'
used as a column separator in the order.)
So you're doing one column write for each order and you have a total of 10K rows.
Unless I'm missing something part of the 'slowness' could be how your writing your orders
on your wide table. There are a couple other unknowns. Are you hashing your keys?
I mean are you getting a bit of 'randomness' in your keys?
So what am I missing?
Mike
> Actually I don't think this is the problem as HBase versions cells, not rows, if I understand
correctly.
>
> > Perhaps slow wide table insert performance is related to row versioning? If I have
a customer row and keep adding order columns one by one, I'm thinking that there might be
a version kept of the row for every order I add? If I am simply inserting a new row for every
order, there is no versioning going on. Could this be causing performance problems?
> >
> >> It appears to be the same or better, not to derail my original question. The
much slower write performance will cause problems for me unless I can resolve that.
> >>
> >>
side, doing a lookup/scan?
> >>>
