hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Limited cross row transactions
Date Wed, 18 Jan 2012 00:57:27 GMT
>> so that child rows are always in the same region as the parent rows
Should the user expect abnormal growth for certain parent(s) ?

I think even HFile v2 has a limit on the file size beyond which operations
would become less efficient.

On Tue, Jan 17, 2012 at 4:48 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> Yes, it's hard constraint, but the building blocks are there.
> User can disable automatic splitting and pre-split the table.
>
> For example one could have a table that hosts a parent child relationship
> in a single table, by prefixing all child child row keys with the parent
> row key,
> Now it is possible to presplit the table (or use a custom local balancer)
> so that child rows are always in the same region as the parent rows.
> And then it would be possible to do cross parent/child transactions.
>
> Using the same scheme it is possible to do consistent parent/child indexes
> (consistent indexes within the same parent prefix).
> (I just made this up, but this is somewhat similar to the Megastore
> design, I think)
>
>
> Anyway, I set out asking whether this would be a useful endeavor, seems
> the answer is resounding "maybe". :)
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Mikael Sitruk <mikael.sitruk@gmail.com>
> To: dev@hbase.apache.org
> Cc:
> Sent: Tuesday, January 17, 2012 3:07 PM
> Subject: Re: Limited cross row transactions
>
> Well i understand the limitation now, asking to be in the same region is
> really hard constraint.
> Even if this is on the same RS this is not enough, because after a restart,
> regions may be allocated differently and now part of the data may be in one
> region under server A and the other part under server B.
>
> Well perhaps we need use case for better understanding, and perhaps finding
> alternative.
>
> The first use case i was thinking of is as follow -
> I need to insert data with different access criteria, but the data inserted
> should be inserted in atomic way.
> In RDBMS i would have two table, insert data in the first one with key#1
> and then in the second one with key #2 then commit.
> In HBase i need to use different column family with key #1 (for atomicity)
> then to manage a kind of secondary index to map key#2 to key #1 (perhaps
> via co-processor) to have quick access to the data of key#2.
> Having cross row trx, i would think of sing different keys under the same
> table (and probably different cf too), without the need to have secondary
> index, but again with the limitation it does not seems to be easily
> feasible.
>
> Mik.
>
> On Wed, Jan 18, 2012 at 12:22 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > People rely on RDBMS for the transaction support.
> >
> > Consider the following example:
> > A highly de-normalized schema puts related users in the same region where
> > this 'limited cross row transactions' works.
> > After some time, the region has to be split (maybe due to good business
> > condition).
> > What should the HBase user do now ?
> >
> > Cheers
> >
> > On Tue, Jan 17, 2012 at 2:13 PM, Mikael Sitruk <mikael.sitruk@gmail.com
> > >wrote:
> >
> > > Ted - My 2 cents as a user.
> > > The user should know what he is doing, this is like a 'delete'
> operation,
> > > this is less intuitive that the original delete in RDBMS, so the same
> > will
> > > be for this light transaction.
> > > If the transaction fails because of cross region server then the design
> > of
> > > the user was wrong
> > > if the transaction fails because of concurrent access, then he should
> be
> > > able to re-read and reprocess its request.
> > > The only problem is how to make sure in advance that the different rows
> > > will be in the same RS?
> > >
> > > Lars - is the limitation is at the region or at the region server? It
> was
> > > not so clear.
> > >
> > > Mikael.S
> > >
> > > On Tue, Jan 17, 2012 at 11:52 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Back to original proposal:
> > > > If client side grouping reveals that the batch of operations cannot
> be
> > > > supported by 'limited cross row transactions', what should the user
> do
> > ?
> > > >
> > > > Cheers
> > > >
> > > > On Tue, Jan 17, 2012 at 1:49 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > > Whether Omid fits the bill is open to discussion.
> > > > >
> > > > > We should revisit HBASE-2315 and provide the support Flavio, et al
> > > need.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On Tue, Jan 17, 2012 at 1:41 PM, Lars George <
> lars.george@gmail.com
> > > > >wrote:
> > > > >
> > > > >> Hi Ted,
> > > > >>
> > > > >> Wouldn't Omid (https://github.com/yahoo/omid) help there? Or
is
> > that
> > > > too
> > > > >> broad? Just curious.
> > > > >>
> > > > >> Lars
> > > > >>
> > > > >> On Jan 17, 2012, at 4:36 PM, Ted Yu wrote:
> > > > >>
> > > > >> > Can we collect use case for 'limited cross row transactions'
> > first ?
> > > > >> >
> > > > >> > I have been thinking about (unlimited) multi-row transaction
> > support
> > > > in
> > > > >> > HBase. It may not be a one-man task. But we should definitely
> > > > implement
> > > > >> it
> > > > >> > someday.
> > > > >> >
> > > > >> > Cheers
> > > > >> >
> > > > >> > On Tue, Jan 17, 2012 at 1:27 PM, lars hofhansl <
> > lhofhansl@yahoo.com
> > > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> I just committed HBASE-5203 (together with HBASE-3584
this
> > > implements
> > > > >> >> atomic row operations).
> > > > >> >> Although a relatively small patch it lays the groundwork
for
> > > > >> heterogeneous
> > > > >> >> operations in a single WALEdit.
> > > > >> >>
> > > > >> >> The interesting part is that even though the code enforced
the
> > > atomic
> > > > >> >> operation to be a for single row, this is not required.
> > > > >> >> It is enough if all involved KVs reside in the same
region.
> > > > >> >>
> > > > >> >> I am not saying that we should add any high level concept
to
> > HBase
> > > > >> (such
> > > > >> >> as the EntityGroups of Megastore).
> > > > >> >>
> > > > >> >> But, with a slight addition to the API (allowing a grouping
of
> > > > multiple
> > > > >> >> row operations) client applications have all the building
> blocks
> > to
> > > > do
> > > > >> >> limited cross row atomic operations.
> > > > >> >> The client application would be responsible for either
> correctly
> > > > >> >> pre-splitting the table, or a custom balancer has to
be
> provided.
> > > > >> >>
> > > > >> >> The operation would fail if the regionserver determines
that it
> > > would
> > > > >> need
> > > > >> >> data from multiple region servers.
> > > > >> >>
> > > > >> >> I think this needs at least minimal support from HBase
and
> cannot
> > > > >> >> (efficiently or without adding more moving parts) by
a client
> API
> > > > only.
> > > > >> >>
> > > > >> >>
> > > > >> >> Comments? Is this worth pursuing? If so, I'll file a
jira and
> > > > provide a
> > > > >> >> patch.
> > > > >> >>
> > > > >> >> Thanks.
> > > > >> >>
> > > > >> >>
> > > > >> >> -- Lars
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Mikael.S
> > >
> >
>
>
>
> --
> Mikael.S
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message