hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Insert into tall table 50% faster than wide table
Date Thu, 23 Dec 2010 03:35:58 GMT

Ted,

yes, 10K rows one for each customer. 
But if you write each order as a column, and there are 10 'columns' in an order, you have
to somehow serialize the 10 columns that represent the order so you get one column per order_id.
Of course you could still write out a column as order_id,order_column and then get your 6000
columns. If you did that, then you have the issue of your column id. Did you go column_id,order_id
or did you go order_id, column_id?
(One has to ask... :-)  )

IMHO I'd elect to put the 10 columns of the order in a single column rather than write the
10 columns as individual columns.  But that's just me. :-)

-Mike


> Date: Wed, 22 Dec 2010 19:00:25 -0800
> Subject: Re: Insert into tall table 50% faster than wide table
> From: yuzhihong@gmail.com
> To: user@hbase.apache.org
> 
> > Each column is the order so you write one column for each order
> As stated earlier, wide table has 6,000 columns instead of 600. :-)
> 
> Bryan:
> Can you describe how you form row keys in each case ?
> 
> 
> On Wed, Dec 22, 2010 at 6:53 PM, Michael Segel <michael_segel@hotmail.com>wrote:
> 
> >
> > HBase does version cells.
> >
> > But I saw something of interest:
> > "
> > >>> In my test, there are 10,000 customers, each customer has 600 orders
> > and each order has 10 columns. The tall table approach results in 6 mil rows
> > of 10 columns. The wide table approach results is 10,000 rows of 6,000
> > columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the
> > orders using a Put for each order, submitted in batches of 1000 as a list of
> > Puts.
> > >>>
> > >>> Are there techniques to speed up inserts with the wide table approach
> > that I am perhaps overlooking?
> > >>>
> > >>
> > > "
> >
> > Ok, so you have 10K by 600 by 10. So the 'tall' design has a row key of
> > customer_id and Order_id with 10 columns in a single column family.
> > So you get 6 million rows and 10 column puts.
> >
> > Now if you do a 'wide' table...
> > Your row key is the 'customer_id' only. Each column is the order so you
> > write one column for each order and you have to figure out how you represent
> > your columns in the order.
> > (An example... your order of 10 items is represented by a string with a
> > 'special character' used as a column separator in the order.)
> > So you're doing one column write for each order and you have a total of 10K
> > rows.
> >
> > Unless I'm missing something part of the 'slowness' could be how your
> > writing your orders on your wide table. There are a couple other unknowns.
> > Are you hashing your keys?
> > I mean are you getting a bit of 'randomness' in your keys?
> >
> > So what am I missing?
> >
> > -Mike
> >
> >
> > > Subject: Re: Insert into tall table 50% faster than wide table
> > > From: bryanck@gmail.com
> > > Date: Wed, 22 Dec 2010 18:24:05 -0800
> > > To: user@hbase.apache.org
> > >
> > > Actually I don't think this is the problem as HBase versions cells, not
> > rows, if I understand correctly.
> > >
> > > On Dec 22, 2010, at 5:03 PM, Bryan Keller wrote:
> > >
> > > > Perhaps slow wide table insert performance is related to row
> > versioning? If I have a customer row and keep adding order columns one by
> > one, I'm thinking that there might be a version kept of the row for every
> > order I add? If I am simply inserting a new row for every order, there is no
> > versioning going on. Could this be causing performance problems?
> > > >
> > > > On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote:
> > > >
> > > >> It appears to be the same or better, not to derail my original
> > question. The much slower write performance will cause problems for me
> > unless I can resolve that.
> > > >>
> > > >> On Dec 22, 2010, at 3:52 PM, Peter Haidinyak wrote:
> > > >>
> > > >>> Interesting, do you know what the time difference would be on
the
> > other side, doing a lookup/scan?
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>> -Pete
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: Bryan Keller [mailto:bryanck@gmail.com]
> > > >>> Sent: Wednesday, December 22, 2010 3:41 PM
> > > >>> To: user@hbase.apache.org
> > > >>> Subject: Insert into tall table 50% faster than wide table
> > > >>>
> > > >>> I have been testing a couple of different approaches to storing
> > customer orders. One is a tall table, where each order is a row. The other
> > is a wide table where each customer is a row, and orders are columns in the
> > row. I am finding that inserts into the tall table, i.e. adding rows for
> > every order, is roughly 50% faster than inserts into the wide table, i.e.
> > adding a row for a customer and then adding columns for orders.
> > > >>>
> > > >>> In my test, there are 10,000 customers, each customer has 600
orders
> > and each order has 10 columns. The tall table approach results in 6 mil rows
> > of 10 columns. The wide table approach results is 10,000 rows of 6,000
> > columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the
> > orders using a Put for each order, submitted in batches of 1000 as a list of
> > Puts.
> > > >>>
> > > >>> Are there techniques to speed up inserts with the wide table approach
> > that I am perhaps overlooking?
> > > >>>
> > > >>
> > > >
> > >
> >
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message