hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arkady Borkovsky" <ark...@inktomi.com>
Subject Re: HBase Design Ideas, Part II
Date Fri, 19 May 2006 22:57:31 GMT

I like the initiative and most of the ideas.
It looks like the design is very record oriented.

Some potential applications of HBase-like functionality are 
batch-oriented: select (a large) subset of records, process a subset of 
their columns (often a small subset), and either add either new 
columns, or new versions of columns.
Several applications like this may be running on the same "table" 

Think, for example, about query log processing, where the key is a 
search query, and the daily stats are represented as columns (several 
columns per day), plus columns for data aggregated by week, by month, 

How well does the proposal address this kind of uses?

-- ab

On May 15, 2006, at 9:54 PM, Michael Cafarella wrote:

> Hi everyone,
> My previous mail mentioned a bunch of design ideas that were mainly
> lifted from Jeff Dean's BigTable talk.  BigTable seems like a useful
> way to do large-scale row storage, and their decisions largely seem
> like the right ones.
> However, BigTable still leaves some things on the table.  Items to
> improve include a query language and multi-row locking, among
> other things.
> Dean said explicitly in his talk that they wanted to avoid multirow
> locking because it's complicated, error-prone, and maybe not necessary.
> He's right on at least the first two, and maybe the third.
> Multiple row locks are useful when you're making a change to
> several rows that should be atomic; you want all the changes
> or none of the changes.  It's also used in traditional databases
> if you want to perform an expensive read operation (like a
> multiway join) and you want to make sure the results don't
> get modified while you're reading.
> Distributed lock acquisition is very hard to do.  It's bug-prone
> and often has very weird performance ramifications.  It's
> difficult to get working, difficult to tune, difficult to everything.
> Here are a few ideas on what to do:
> 1)  Suck it up and have the client acquire locks on multiple
> HRegionServers simultaneously.  All clients would have to
> agree to acquire locks according to some global ordering to
> avoid deadlock.  HRegions would not be allowed to migrate
> to a new server if locked.
> If this is a rare circumstance, a better approach would be
> to have a dedicated "lock acquirer" through which clients
> make requests.  It doesn't help the theoretical problem here,
> but it would make debugging an awful lot easier.
> 2)  In the case of long-lasting read operations, we can
> use versioning to guarantee consistency.  If each row is
> annotated with an edit timestamp, and we know that there
> is sufficient version history available, the long-lasting job
> can run over a specific version only.
> Edits can continue to be made to the database while the
> read-only job is ongoing.  The operation is performed over
> the database as of the time the task was submitted.
> 3) In the case of multiple row updates, we may be able to
> use different edit semantics to avoid locking.  For example,
> consider that we want to add a single column/value pair to
> multiple rows.  We want this to happen atomically, so that
> both rows get the value or neither of them do so.
> If it's just an add, then we don't need to lock the rows at
> all; the add will always succeed, even if other writes
> intervene. Traditionally there's been no difference between
> among data "updates", so they all require locking.  If we
> can get a client to adjust the update semantics slightly,
> then the locking can be much more relaxed.
> I'd say that "add" or "append" semantics are likely to be
> at least as common as "edit" semantics.
> Can you think of the family of edit semantics you'd like
> to see offered here?
> Also, how useful do you think a general-purpose query language
> would be for HBase?  It would be fairly straightforward to implement,
> for example, a poor man's version of SQL that has different locking
> and update behavior (and which chucks out the more exotic elements).
> This might be compiled into a piece of code that is executed
> immediately, or it might be transformed into a long-lasting mapreduce
> job.
> I have a few ideas for such a language, but I'm worried it's getting
> a little far afield from what we're interested in for Hadoop.
> --Mike

View raw message