HBase does version cells.
But I saw something of interest:
"
>>> In my test, there are 10,000 customers, each customer has 600 orders and each
order has 10 columns. The tall table approach results in 6 mil rows of 10 columns. The wide
table approach results is 10,000 rows of 6,000 columns. I'm using hbase 0.8920100924 and
hadoop 0.20.2. I am adding the orders using a Put for each order, submitted in batches of
1000 as a list of Puts.
>>>
>>> Are there techniques to speed up inserts with the wide table approach that I
am perhaps overlooking?
>>>
>>
> "
Ok, so you have 10K by 600 by 10. So the 'tall' design has a row key of customer_id and Order_id
with 10 columns in a single column family.
So you get 6 million rows and 10 column puts.
Now if you do a 'wide' table...
Your row key is the 'customer_id' only. Each column is the order so you write one column for
each order and you have to figure out how you represent your columns in the order.
(An example... your order of 10 items is represented by a string with a 'special character'
used as a column separator in the order.)
So you're doing one column write for each order and you have a total of 10K rows.
Unless I'm missing something part of the 'slowness' could be how your writing your orders
on your wide table. There are a couple other unknowns. Are you hashing your keys?
I mean are you getting a bit of 'randomness' in your keys?
So what am I missing?
Mike
> Subject: Re: Insert into tall table 50% faster than wide table
> From: bryanck@gmail.com
> Date: Wed, 22 Dec 2010 18:24:05 0800
> To: user@hbase.apache.org
>
> Actually I don't think this is the problem as HBase versions cells, not rows, if I understand
correctly.
>
> On Dec 22, 2010, at 5:03 PM, Bryan Keller wrote:
>
> > Perhaps slow wide table insert performance is related to row versioning? If I have
a customer row and keep adding order columns one by one, I'm thinking that there might be
a version kept of the row for every order I add? If I am simply inserting a new row for every
order, there is no versioning going on. Could this be causing performance problems?
> >
> > On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote:
> >
> >> It appears to be the same or better, not to derail my original question. The
much slower write performance will cause problems for me unless I can resolve that.
> >>
> >> On Dec 22, 2010, at 3:52 PM, Peter Haidinyak wrote:
> >>
> >>> Interesting, do you know what the time difference would be on the other
side, doing a lookup/scan?
> >>>
> >>> Thanks
> >>>
> >>> Pete
> >>>
> >>> Original Message
> >>> From: Bryan Keller [mailto:bryanck@gmail.com]
> >>> Sent: Wednesday, December 22, 2010 3:41 PM
> >>> To: user@hbase.apache.org
> >>> Subject: Insert into tall table 50% faster than wide table
> >>>
> >>> I have been testing a couple of different approaches to storing customer
orders. One is a tall table, where each order is a row. The other is a wide table where each
customer is a row, and orders are columns in the row. I am finding that inserts into the tall
table, i.e. adding rows for every order, is roughly 50% faster than inserts into the wide
table, i.e. adding a row for a customer and then adding columns for orders.
> >>>
> >>> In my test, there are 10,000 customers, each customer has 600 orders and
each order has 10 columns. The tall table approach results in 6 mil rows of 10 columns. The
wide table approach results is 10,000 rows of 6,000 columns. I'm using hbase 0.8920100924
and hadoop 0.20.2. I am adding the orders using a Put for each order, submitted in batches
of 1000 as a list of Puts.
> >>>
> >>> Are there techniques to speed up inserts with the wide table approach that
I am perhaps overlooking?
> >>>
> >>
> >
>
