hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Schema design question - Hot Key concerns
Date Fri, 18 Nov 2011 19:04:56 GMT

Not sure if you'd consider this a 'big data' problem.

First, IMHO you're better off serving this out of a relational model. 

Having said that....

'Hot Row' as in reads isn't a bad thing since its in cache. 

'Hot Row' as in updates... not really a good thing since you have to lock the row to update
it. So your updates going to the same row will kill performance. 
(Sorry but this really sounds like a school level homework question...)

In terms of design in HBase... 
Since you've already stated that the row will get hit multiple times for the same seat. (reservations
and cancellations) 
You have a couple of problems...
1) Lack of ACID control. This makes it harder to design a reservation system.
2) You will have inventory problems due to rows being locked while people are querying the
system to purchase a seat.

With respect to the schema, I would suggest that you rethink it. 

Just an example... The Rolling Stones selling out Cleveland Stadium for a Rock-N-Roll Hall
of Fame concert. You have 100,000 seats in the stadium plus luxury boxes, then add field seats...
read a lot of people. 

As its already been pointed out... this row becomes very hot while tickets go on sale. It
will become a bottleneck.

Also assuming your SHOW_ID, is really a composite of (venue, show, date), you will want to
further fragment your rows. 
Also you're probably going to want to split your data in to two different tables and then
write some ACID compliance at your APP level. 

Just a quick thought before I pop out for lunch...


> Date: Fri, 18 Nov 2011 10:02:54 -0800
> Subject: Re: Schema design question - Hot Key concerns
> From: selekt86@yahoo.com
> To: user@hbase.apache.org
> 
> One of the concerns I see with this schema is if one of the shows
> becomes hot. Since you are maintaining your bookings at the column
> level,
> a hot "row" cannot be partitioned across regions. Hbase is atomic at
> the row level. Therefore, different clients updating to the same
> SHOW_ID
> will compete with each other. The throughput on a single row is
> limited because operations at the row level are atomic.
> 
> See this discussion on Quora:
> 
> http://www.quora.com/Is-there-a-limit-to-the-number-of-columns-in-an-HBase-row
> 
> I will let the experts comment further.
> 
> 
> On Fri, Nov 18, 2011 at 9:33 AM, Suraj Varma <svarma.ng@gmail.com> wrote:
> > I have an HBase schema design question that I wanted to discuss with the list.
> >
> > Let's say we have a "wide" table design that has a table with one
> > column family containing "show bookings", say.
> >
> > RowKey: SHOW_ID
> > Columns: SEATS_AVAILABLE, BOOKING_<#1>, BOOKING_<#2>, BOOKING_<#3>,
etc
> > Values: <remaining available seats>, <seats booked>, <seats booked,
> > <seats booked>, etc
> >
> > Each "SHOW_ID" will have variable number of columns.
> >
> > Usage Pattern:
> > 1) Multiple clients / threads are constantly
> > creating/updating/deleting "bookings" and this results in a column
> > being added /updated/deleted to the row.
> > 2) The SEATS_AVAILABLE column needs to be atomically updated whenever
> > a corresponding BOOKING_<#> is added, updated or deleted.
> > 3) Clients update their own unique BOOKING columns (i.e. clients
> > update their own mutually exclusive BOOKING_<#> columns.
> > 4) Clients can concurrently update the SEATS_AVAILABLE column.
> > 5) Some SHOW_ID will be harder hit than other SHOW_IDs
> > 6) A TTL on the BOOKING columns will be set to expire them after some set time.
> > 7) We want to  leverage the atomic update at "row level" that HBase
> > provides for atomically updating the related columns.
> >
> > So - we are visualizing this as sort of an "equalizer" graphic on a
> > stereo where each row is constantly varying in terms of columns added
> > & removed. The SEATS_AVAILABLE value goes up & down correspondingly.
> >
> > Questions / Notes:
> > 1) Could this lead to a hot key / hot row scenario? The columns being
> > updated are mutually exclusive except for the SEATS_AVAILABLE. Or
> > would this be very low overhead given that only one column is really
> > being "updated" by multiple client threads?
> >
> > 2) The alternative we had explored was tall table where each BOOKING
> > is a separate row (SHOW_ID-BOOKING-<#> would be the key) ... however,
> > in this case, we won't be able to atomically update the
> > SEATS_AVAILABLE column at the same time.
> >
> > 3) In terms of "row locking", what is the granularity? i.e. when is
> > the row level lock engaged to make it atomic (i.e. are the column
> > updates made on the side and "swapped" in with the row level lock?) or
> > is the row level lock held for the full duration of the update.
> >
> > 4) I think the concern is whether this design is scalable as the
> > number of clients keep increasing over time ...
> >
> > 5) Any other suggestions on how hot row key scenario (if real) can be
> > sidestepped?
> >
> > Thanks,
> > --Suraj
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message