hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Varma <svarma...@gmail.com>
Subject Re: Question on Coprocessors and Atomicity
Date Sat, 10 Dec 2011 18:16:14 GMT
I've been following HBASE-4605 with interest and I'm going through the
patches. I don't want to take away from all the hard work that's gone
into it ...

The more I think of it, I'm wondering how the Constraint can be
enforced without enforcing atomicity.

>From the jira description, the intention of this feature is:
"Essentially, people would implement a 'Constraint' interface for
checking keys before they are put into a table. Puts that are valid
get written to the table, but if not people can will throw an
exception that gets propagated back to the client explaining why the
put was invalid."

If the row lock is released between the time the coprocessor finishes
"preXXXX" checks and the core mutation method is invoked (as has been
discussed in this thread), how can the Constraint be ensured? If two
requests are being processed in parallel, there is every possibility
that both requests pass the "Constraint" check individually, but break
it together (e.g. even simple checks like column value == 10 would
break if two requests fire concurrently).

So - I'm questioning whether a pure Coprocessor implementation alone
would be sufficient?

I think we'll need an approach that makes the constraint checking and
mutation to be _atomically_ achieved
a) either by taking a row lock and passing that into put / checkAndPut
b) referencing & checking the constraint directly from within the put
/ checkAndPut methods (like we do with the comparator, for instance)
under a row lock.

Without being able to atomically enforce the constraint, I'm wondering
if it is misleading to future users who may create a constraint that
may fail to be enforced under heavy concurrent use.

I know a lot of work has gone into the patches ... but I thought it
better to discuss this before rolling it out to the larger community
... :)

Thanks,
--Suraj


On Fri, Dec 9, 2011 at 1:31 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
> Hi:
> I opened a jira ticket on this: https://issues.apache.org/jira/browse/HBASE-4999
>
> I have linked to HBASE-4605 in the description to show related work on
> Constraints by Jesse.
>
> Thanks!
> --Suraj
>
> On Sun, Dec 4, 2011 at 1:10 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> Currently ConstraintProcessor latches onto prePut() to perform validation
>> check.
>>
>> From HRegion.doMiniBatchPut() where prePut() is called:
>>    /* Run coprocessor pre hook outside of locks to avoid deadlock */
>> So to make use of Constraint in Suraj's scenario, we have some decisions to
>> make about various factors.
>>
>> Cheers
>>
>> On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <svarma.ng@gmail.com> wrote:
>>
>>> Jesse:
>>> >> Quick soln - write a CP to check the single row (blocking the put).
>>>
>>> Yeah - given that I want this to be atomically done, I'm wondering if
>>> this would even work (because, I believe I'd need to unlock the row so
>>> that the checkAndMutate can take the lock - so, there is a brief
>>> window between where there is no lock being held - and some other
>>> thread could take that lock). One option would be to pass in a lock to
>>> checkAndMutate ... but that would increase the locking period and may
>>> have performance implications, I think.
>>>
>>> I see a lot of potential in the Constraints implementation - it would
>>> really open up CAS operations to do functional constraint checking,
>>> rather than just value comparisons.
>>>
>>> --Suraj
>>>
>>> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <svarma.ng@gmail.com> wrote:
>>> > Thanks - I see that the lock is taken internal to checkAndMutate.
>>> >
>>> > I'm wondering whether it is a better idea to actually pass in a
>>> > Constraint (or even Constraints) as the checkAndMutate argument. Right
>>> > now it is taking in an Comparator and a CompareOp for verification.
>>> > But, this could just be a special case of Constraint which is
>>> > evaluated within the lock.
>>> >
>>> > In other words, we could open up a richer Constraint checking api
>>> > where any "functional" Constraint check can be performed in the
>>> > checkAndPut operation.
>>> >
>>> > This would also not have the same performance impact of taking a
>>> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
>>> > really (in my mind) implementing the compare-and-set more generically.
>>> >
>>> > I also see the potential of passing in multiple constraints (say
>>> > upper/lower bounds in Increment/Decrement operations) etc.
>>> >
>>> > --Suraj
>>> >
>>> >
>>> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> >> From HRegionServer.checkAndPut():
>>> >>    if (region.getCoprocessorHost() != null) {
>>> >>      Boolean result = region.getCoprocessorHost()
>>> >>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL,
>>> comparator,
>>> >>          put);
>>> >> ...
>>> >>    boolean result = checkAndMutate(regionName, row, family, qualifier,
>>> >>      CompareOp.EQUAL, new BinaryComparator(value), put,
>>> >>      lock);
>>> >> We can see that the lock isn't taken for preCheckAndPut().
>>> >>
>>> >> To satisfy Suraj's requirement, I think a slight change to
>>> checkAndPut() is
>>> >> needed so that atomicity can be achieved across preCheckAndPut() and
>>> >> checkAndMutate().
>>> >>
>>> >> Cheers
>>> >>
>>> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <svarma.ng@gmail.com>
>>> wrote:
>>> >>
>>> >>> Just so my question is clear ... everything I'm suggesting is in
the
>>> >>> context of a single row (not cross row / table). - so, yes, I'm
>>> >>> guessing obtaining a RowLock on the region side during preCheckAndPut
>>> >>> / postCheckAndPut would certainly work. Which was why I was asking
>>> >>> whether the pre/postCheckAndPut obtains the row lock or whether
the
>>> >>> row lock is only obtained within checkAndPut.
>>> >>>
>>> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ...
will
>>> >>> that even work? i.e. can the same rowlock be inherited by the
>>> >>> checkAndPut api within that thread's context? Or will preCheckAndPut
>>> >>> have to release the lock so that checkAndPut can take it (which
won't
>>> >>> work for my case, as it has to be atomic between the preCheck and
>>> >>> Put.)
>>> >>>
>>> >>> Thanks for pointing me to the Constraints functionality - I'll take
a
>>> >>> look at whether it could potentially work.
>>> >>> --Suraj
>>> >>>
>>> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <jesse.k.yates@gmail.com>
>>> >>> wrote:
>>> >>> > I think the feature you are looking for is a Constraint. Currently
>>> they
>>> >>> are
>>> >>> > being added to 0.94 in
>>> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
>>> >>> > they are almost ready to be rolled in, and backporting to 0.92
is
>>> >>> > definitely doable.
>>> >>> >
>>> >>> > However, Constraints aren't going to be quite flexible enough
to
>>> >>> > efficiently support what you are describing. For instance,
with a
>>> >>> > constraint, you are ideally just checking the put value against
some
>>> >>> simple
>>> >>> > constraint (never over 10 or always an integer), but looking
at the
>>> >>> current
>>> >>> > state of the table before allowing the put would currently
require
>>> >>> creating
>>> >>> > a full blown connection to the local table through another
HTable.
>>> >>> >
>>> >>> > In the short term, you could write a simple coprocessor to
do this
>>> >>> checking
>>> >>> > and then move over to constraints (which are a simpler, more
>>> flexible,
>>> >>> way
>>> >>> > of doing this) when the necessary features have been added.
>>> >>> >
>>> >>> > It is worth discussing if it makes sense to have access to
the local
>>> >>> region
>>> >>> > through a constraint, though that breaks the idea a little
bit, it
>>> would
>>> >>> > certainly be useful and not overly wasteful in terms of runtime.
>>> >>> >
>>> >>> > Supposing the feature would be added to talk to the local table,
and
>>> >>> since
>>> >>> > the puts are going to be serialized on the regionserver (at
least to
>>> that
>>> >>> > single row you are trying to update), you will never get a
situation
>>> >>> where
>>> >>> > the value added is over the threshold. If you were really worried
>>> about
>>> >>> the
>>> >>> > atomicity of the operation, then when doing the put, first
get the
>>> >>> RowLock,
>>> >>> > then do the put and release the RowLock. However, that latter
method
>>> is
>>> >>> > going to be really slow, so should only be used as a stop gap
if the
>>> >>> > constraint doesn't work as expected, until a patch is made
for
>>> >>> constraints.
>>> >>> >
>>> >>> > Feel free to open up a ticket and link it to 4605 for adding
the
>>> local
>>> >>> > table access functionality, and we can discuss the de/merits
of
>>> adding
>>> >>> the
>>> >>> > access.
>>> >>> >
>>> >>> > -Jesse
>>> >>> >
>>> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <svarma.ng@gmail.com>
>>> wrote:
>>> >>> >
>>> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api
with
>>> >>> >> coprocessors and I'm wondering ... are these pre/post checks
done
>>> >>> >> _after_ taking the row lock or is the row lock only done
within the
>>> >>> >> checkAndPut api.
>>> >>> >>
>>> >>> >> I'm interested in seeing if we can implement something
like:
>>> >>> >> (in pseudo sql)
>>> >>> >> update table-name
>>> >>> >> set column-name = new-value
>>> >>> >> where (column-value - new-value) > threshold-value
>>> >>> >>
>>> >>> >> Basically ... I want to enhance the checkAndPut to not
just compare
>>> >>> >> "values" ... but apply an arbitrary function on the value
>>> _atomically_
>>> >>> >> in the Put call. Multiple threads would be firing these
mutations
>>> and
>>> >>> >> I'd like the threshold-value above to never be breached
under any
>>> >>> >> circumstance.
>>> >>> >>
>>> >>> >> Is there a solution that can be implemented either via
checkAndPut
>>> or
>>> >>> >> using coprocessors preCheckAndPut? If not, would this be
a useful
>>> >>> >> feature to build in HBase?
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >> --Suraj
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > -------------------
>>> >>> > Jesse Yates
>>> >>> > 240-888-2200
>>> >>> > @jesse_yates
>>> >>>
>>>

Mime
View raw message