hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: Limited cross row transactions
Date Wed, 18 Jan 2012 01:15:37 GMT
>
> one could have a table that hosts a parent child relationship in a single
> table, by prefixing all child child row keys with the parent row key,
> Now it is possible to presplit the table (or use a custom local balancer)
> so that child rows are always in the same region as the parent rows.


I thought BigTable/Megastore handled this kind of thing by putting
everything into a single row with the entity group id as the hbase rowKey.
 Then you add all parent and child values to the same hbase row by pushing
their original row keys into the qualifiers.  You build the qualifiers by
concatenating the table name with the original row key.

HBase should handle the arbitrarily wide rows and prevent the row from
splitting between regions.  Having the table name as a prefix of each
qualifier adds a lot of metadata, but good prefix compression should
eliminate that.


On Tue, Jan 17, 2012 at 4:57 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> >> so that child rows are always in the same region as the parent rows
> Should the user expect abnormal growth for certain parent(s) ?
>
> I think even HFile v2 has a limit on the file size beyond which operations
> would become less efficient.
>
> On Tue, Jan 17, 2012 at 4:48 PM, lars hofhansl <lhofhansl@yahoo.com>
> wrote:
>
> > Yes, it's hard constraint, but the building blocks are there.
> > User can disable automatic splitting and pre-split the table.
> >
> > For example one could have a table that hosts a parent child relationship
> > in a single table, by prefixing all child child row keys with the parent
> > row key,
> > Now it is possible to presplit the table (or use a custom local balancer)
> > so that child rows are always in the same region as the parent rows.
> > And then it would be possible to do cross parent/child transactions.
> >
> > Using the same scheme it is possible to do consistent parent/child
> indexes
> > (consistent indexes within the same parent prefix).
> > (I just made this up, but this is somewhat similar to the Megastore
> > design, I think)
> >
> >
> > Anyway, I set out asking whether this would be a useful endeavor, seems
> > the answer is resounding "maybe". :)
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Mikael Sitruk <mikael.sitruk@gmail.com>
> > To: dev@hbase.apache.org
> > Cc:
> > Sent: Tuesday, January 17, 2012 3:07 PM
> > Subject: Re: Limited cross row transactions
> >
> > Well i understand the limitation now, asking to be in the same region is
> > really hard constraint.
> > Even if this is on the same RS this is not enough, because after a
> restart,
> > regions may be allocated differently and now part of the data may be in
> one
> > region under server A and the other part under server B.
> >
> > Well perhaps we need use case for better understanding, and perhaps
> finding
> > alternative.
> >
> > The first use case i was thinking of is as follow -
> > I need to insert data with different access criteria, but the data
> inserted
> > should be inserted in atomic way.
> > In RDBMS i would have two table, insert data in the first one with key#1
> > and then in the second one with key #2 then commit.
> > In HBase i need to use different column family with key #1 (for
> atomicity)
> > then to manage a kind of secondary index to map key#2 to key #1 (perhaps
> > via co-processor) to have quick access to the data of key#2.
> > Having cross row trx, i would think of sing different keys under the same
> > table (and probably different cf too), without the need to have secondary
> > index, but again with the limitation it does not seems to be easily
> > feasible.
> >
> > Mik.
> >
> > On Wed, Jan 18, 2012 at 12:22 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > People rely on RDBMS for the transaction support.
> > >
> > > Consider the following example:
> > > A highly de-normalized schema puts related users in the same region
> where
> > > this 'limited cross row transactions' works.
> > > After some time, the region has to be split (maybe due to good business
> > > condition).
> > > What should the HBase user do now ?
> > >
> > > Cheers
> > >
> > > On Tue, Jan 17, 2012 at 2:13 PM, Mikael Sitruk <
> mikael.sitruk@gmail.com
> > > >wrote:
> > >
> > > > Ted - My 2 cents as a user.
> > > > The user should know what he is doing, this is like a 'delete'
> > operation,
> > > > this is less intuitive that the original delete in RDBMS, so the same
> > > will
> > > > be for this light transaction.
> > > > If the transaction fails because of cross region server then the
> design
> > > of
> > > > the user was wrong
> > > > if the transaction fails because of concurrent access, then he should
> > be
> > > > able to re-read and reprocess its request.
> > > > The only problem is how to make sure in advance that the different
> rows
> > > > will be in the same RS?
> > > >
> > > > Lars - is the limitation is at the region or at the region server? It
> > was
> > > > not so clear.
> > > >
> > > > Mikael.S
> > > >
> > > > On Tue, Jan 17, 2012 at 11:52 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > >
> > > > > Back to original proposal:
> > > > > If client side grouping reveals that the batch of operations cannot
> > be
> > > > > supported by 'limited cross row transactions', what should the user
> > do
> > > ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, Jan 17, 2012 at 1:49 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > > >
> > > > > > Whether Omid fits the bill is open to discussion.
> > > > > >
> > > > > > We should revisit HBASE-2315 and provide the support Flavio,
et
> al
> > > > need.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 17, 2012 at 1:41 PM, Lars George <
> > lars.george@gmail.com
> > > > > >wrote:
> > > > > >
> > > > > >> Hi Ted,
> > > > > >>
> > > > > >> Wouldn't Omid (https://github.com/yahoo/omid) help there?
Or is
> > > that
> > > > > too
> > > > > >> broad? Just curious.
> > > > > >>
> > > > > >> Lars
> > > > > >>
> > > > > >> On Jan 17, 2012, at 4:36 PM, Ted Yu wrote:
> > > > > >>
> > > > > >> > Can we collect use case for 'limited cross row transactions'
> > > first ?
> > > > > >> >
> > > > > >> > I have been thinking about (unlimited) multi-row transaction
> > > support
> > > > > in
> > > > > >> > HBase. It may not be a one-man task. But we should
definitely
> > > > > implement
> > > > > >> it
> > > > > >> > someday.
> > > > > >> >
> > > > > >> > Cheers
> > > > > >> >
> > > > > >> > On Tue, Jan 17, 2012 at 1:27 PM, lars hofhansl <
> > > lhofhansl@yahoo.com
> > > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> >> I just committed HBASE-5203 (together with HBASE-3584
this
> > > > implements
> > > > > >> >> atomic row operations).
> > > > > >> >> Although a relatively small patch it lays the groundwork
for
> > > > > >> heterogeneous
> > > > > >> >> operations in a single WALEdit.
> > > > > >> >>
> > > > > >> >> The interesting part is that even though the code
enforced
> the
> > > > atomic
> > > > > >> >> operation to be a for single row, this is not required.
> > > > > >> >> It is enough if all involved KVs reside in the
same region.
> > > > > >> >>
> > > > > >> >> I am not saying that we should add any high level
concept to
> > > HBase
> > > > > >> (such
> > > > > >> >> as the EntityGroups of Megastore).
> > > > > >> >>
> > > > > >> >> But, with a slight addition to the API (allowing
a grouping
> of
> > > > > multiple
> > > > > >> >> row operations) client applications have all the
building
> > blocks
> > > to
> > > > > do
> > > > > >> >> limited cross row atomic operations.
> > > > > >> >> The client application would be responsible for
either
> > correctly
> > > > > >> >> pre-splitting the table, or a custom balancer has
to be
> > provided.
> > > > > >> >>
> > > > > >> >> The operation would fail if the regionserver determines
that
> it
> > > > would
> > > > > >> need
> > > > > >> >> data from multiple region servers.
> > > > > >> >>
> > > > > >> >> I think this needs at least minimal support from
HBase and
> > cannot
> > > > > >> >> (efficiently or without adding more moving parts)
by a client
> > API
> > > > > only.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> Comments? Is this worth pursuing? If so, I'll file
a jira and
> > > > > provide a
> > > > > >> >> patch.
> > > > > >> >>
> > > > > >> >> Thanks.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> -- Lars
> > > > > >> >>
> > > > > >> >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Mikael.S
> > > >
> > >
> >
> >
> >
> > --
> > Mikael.S
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message