phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ohad Shacham (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-5090) Discuss: Allow transactional writes without buffering the entire transaction on the client.
Date Sun, 06 Jan 2019 09:00:00 GMT


Ohad Shacham commented on PHOENIX-5090:

We added row level conflict analysis in  -OMID-71.- 

As [~yonigo] said, this version keeps the modified cells information at the client size.

This cell information is needed from two reasons, the first is for adding shadow cells at
post commit and the second is for deleting these cells when a transaction aborts. 

The current implementation only supports ROW level conflict analysis and CELL level conflict
analysis. A combination of these two is not supported :) Is it needed? I mean does the semantics
of SQL allow modifications of different cells at the same row without a conflict? It can be
extended for sure, however, can incur some runtime penalties.


Regarding the memory limitations. In general, the cell level information can be discarded
from the client side, however, this will remove the shadow cells update at post commit and
also will remove the deletion that client does in case of an abort. For the first, we can
count on the fact that clients that reads a cell without a shadow cell creates a shadow cell.
This does not happen asynchronously and will be beneficial only if cells are accessed many
times for read. Another disadvantage is that the commit information will stay at the commit
table and can only be discarded later on by a gc. For earlier gc, we can add a counter in
the commit table, for each transaction, that shows how many cells/rows were written by the
transaction and decrement this number when some client adds a shadow cells, and delete when
it becomes zero. However, this requires CheckAndMutate for the update and I am sure this is
not what we would like to do. We can add a shadow cell at the ROW level, as [~yonigo], suggested
but this might requires additional HBase gets when looking for this shadow cell.


For the second, we can wait for the GC to clean these cells but this will create only when
the transaction id will be lower than the low water mark. As far as we saw HBase row delete
operation deletes all the row's column with version which is *lower* or equal to the transaction
version and we cannot use this. 



> Discuss: Allow transactional writes without buffering the entire transaction on the client.
> -------------------------------------------------------------------------------------------
>                 Key: PHOENIX-5090
>                 URL:
>             Project: Phoenix
>          Issue Type: Wish
>            Reporter: Lars Hofhansl
>            Priority: Major
> Currently it is not possible execute transactions in Phoenix that are too large to be
buffered entirely on the client.
> Both Tephra and Omid support writing uncommitted data to HBase immediately and at full
speed. The client still needs to keep tracks of the rows changes for:
> # Conflict detection
> # (for Omid) writing the shadow cells
> I'd like to do some brainstorming here.
> * It should *always* be enough to only hold on to the changed rows (and columns?) only
for _conflict resolution_ and free the rest from the client as soon as the uncommitted data
is written to HBase.
> * For the shadows cells we need only keep the rows changed, right?
> * There are situations where we can avoid the client site buffering entirely (perhaps
only for Tephra) when we declare a table or upsert not to participate in conflict resolution.
> [~tdsilva], [~ohads], [~yonigo], [~jamestaylor], [~vincentpoon], more, better ideas?

This message was sent by Atlassian JIRA

View raw message