tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Poorna Chandra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEPHRA-35) Prune invalid transaction set once all data for a given invalid transaction has been dropped
Date Tue, 12 Jul 2016 07:39:20 GMT

    [ https://issues.apache.org/jira/browse/TEPHRA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372413#comment-15372413

Poorna Chandra commented on TEPHRA-35:

[~jamestaylor] We could go with the approach of transaction manager tracking tables involved
in a transaction. In that case it is good to track the tables at the start of the transaction
rather than the commit time. In the latter case - since the transaction is already being committed,
there is a very small chance that it will become invalid.

At the time a transaction client starts a transaction, if it knows in advance all the tables
that will be used in the transaction (like in Phoenix's case), then it can send the table
names over to the transaction manager to be associated with the new transaction. Transaction
manager can then track this (transactionId -> table-list) information. This information
can then be used in place of "region set at time t" to figure out whether all data associated
with an invalid transaction has been removed from all regions associated with the invalid
transaction. This will reduce the wait time for pruning to - until all tables associated with
a given invalid transaction have been major compacted.

In addition to this, we will still need to have the other approach of waiting for all tables
to major compact. This is to handle cases where the client does not know all tables that are
part of a transaction at transaction start time. However, such invalid transactions without
table information should not hold back pruning of invalid transactions with table information.

Tracking tables and its associated transaction requires an API change and some changes to
transaction manager's internal data structures. To keep changes contained, we can first implement
the waiting for all tables to major compact solution first. Then as part of second installment
we can add this enhancement of tracking tables and transactions.

> Prune invalid transaction set once all data for a given invalid transaction has been
> --------------------------------------------------------------------------------------------
>                 Key: TEPHRA-35
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-35
>             Project: Tephra
>          Issue Type: New Feature
>            Reporter: Gary Helmling
>            Assignee: Poorna Chandra
>            Priority: Blocker
>         Attachments: ApacheTephraAutomaticInvalidListPruning-v2.pdf
> In addition to dropping the data from invalid transactions we need to be able to prune
the invalid set of any transactions where data cleanup has been completely performed. Without
this, the invalid set will grow indefinitely and become a greater and greater cost to in-progress
transactions over time.
> To do this correctly, the TransactionDataJanitor coprocessor will need to maintain some
bookkeeping for the transaction data that it removes, so that the transaction manager can
reason about when all of a given transaction's data has been removed. Only at this point can
the transaction manager safely drop the transaction ID from the invalid set.
> One approach would be for the TransactionDataJanitor to update a table marking when a
major compaction was performed on a region and what transaction IDs were filtered out. Once
all regions in a table containing the transaction data have been compacted, we can remove
the filtered out transaction IDs from the invalid set. However, this will need to cope with
changing region names due to splits, etc.

This message was sent by Atlassian JIRA

View raw message