hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clint Morgan <clint.mor...@troove.net>
Subject Re: Secondary indexes and transactions
Date Tue, 12 Jan 2010 00:21:27 GMT
The client drives the 2PC process, so after it has established that a trx
may be committed (by asking each region), it tells each region to commit.
Only then does it actually start to write to the base/indexed tables. So we
don't really have a problem with "overlapping rollbacks", because a rollback
is simply not processing the Puts.

When the client tells each region to commit, the region will process the
Puts which will then trigger the RPCs to update the index. Transactional
conflicts should not cause an index to get out of sync because the writes
never happen.

If the regionserver crashes during this commit process, then I *think* it
should still recover correctly. It will see the transactional operations in
the WAL, and the propagate the puts into the index. However this WAL
recovery stuff has been changing, and I'm not confident that it currently
works in all failure cases.

Does this normal case address your concerns?

-clint

On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
<mridulm@yahoo-inc.com>wrote:

> stack wrote:
>
>> On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>> <mridulm@yahoo-inc.com>wrote:
>>
>>   I was wondering about the atomicity guarantees when using secondary
>>> indexes from within a transaction.
>>>
>>>
>>>  You are talking about indexed hbase from transactional hbase contrib?
>>
>
>
> Yes, exactly.
>
>
>
>>
>>  From what I could gather, updates to the index table goes through its own
>>> (set of) rpc before the underlying transactional table is updated - and
>>> these update happens outside of the locks for the transaction table.
>>>
>>>
>> Yes.  But IIUC, the client is running a transaction that spans the update
>> to
>> the two tables.  It'll take care of the undo should say the update to the
>> transacation table fails.
>>
>>
>
> Isn't the update to the secondary index implicitly done ? As in, does the
> client 'see' this update ?
> My impression was that the secondary index update was done by the
> indexedregion - and was not visible to the client : which manages occ
> transaction ...
>
>
>
>
>>
>>  Also, the index regions need not colocate with the table region.
>>>
>>> So essentially wondering
>>> a) if the index can go out of sync with the transactional table ?
>>>
>>>
>> It should not.  The client should run the undos if the insert does not go
>> into both tables successfully.
>>
>>
>>
>>  b) if there are errors with update to table, are the indexes rolled back
>>> ?
>>>
>>>
>> Yes.
>>
>>
>>
>>  c) Whether there can be issues if there are parallel updates invoked for
>>> the same row - whether index changes end up being inconsistent with table
>>> data (due to lock not being held while updating index).
>>>
>>>
>>
>> This might be possible.  There is a lock held on a row.  I'm not sure if
>> the
>> lock is held on transaction table row while the update is being done to
>> the
>> index table.
>>
>> This is the doc. as it stands on transactional hbase:
>>
>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>
>> Here is the doc. on indexed-transactional hbase:
>>
>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>
>> You've probably tripped over it already but just in case, it might help.
>>
>
>
> I did go through the package sumamries, thanks : which is what increased my
> confusion.
>
> My current understanding is :
>
> a) Client 'simulates' the transaction - by inspecting the state of the rows
> on commit and rolls back in case of conflicting updates.
>
> b) secondary index updates are transparent to client api and are directly
> done by the indexedregion as part of its implementation.
>
>
> If this is correct, I am wondering if overlapping rollbacks can result in
> secondary index going out of sync with the table since (a) does not see
> those (one update gets rolled back while another goes through - or
> variations of it).
>
>
>
> Thanks,
> Mridul
>
>
>
>  St.Ack
>>
>>
>>
>>> I guess they are all kind of related queries.
>>>
>>>
>>> I was not able to get a clear picture from the archives, so RTFM/pointers
>>> would be helpful if this is already answered.
>>>
>>> Thanks,
>>> Mridul
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message