tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Neumann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEPHRA-247) Avoid encoding the transaction multiple times
Date Mon, 15 Jan 2018 18:21:00 GMT

    [ https://issues.apache.org/jira/browse/TEPHRA-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326493#comment-16326493

Andreas Neumann commented on TEPHRA-247:

I do see that it could be possible to work around the region split using a post split hook
- but I still don't feel comfortable with the approach. The issue we are trying to solve is
that when the invalid list gets large - and so does the transaction object - then we encode,
transmit and decode this large object with every get() performed by in this transaction.

A very important case is a small transaction - say a transaction that performs a single get
or scan, followed by a put, and then commits. Today, this requires sending the transaction
only once: for the read operation, and it only gets sent to one region, or only the regions
involved in the scan. The proposed design requires that we send the transaction to every region
when the transaction starts. That appears to add overhead rather than reducing overhead. 

I feel that if we want to reduce overhead, we have multiple angles to look at this:
 * reduce the cost of encoding, transmitting and decoding the tx. This could involve:
 ** using a more efficient (faster) or more compact (smaller) codec
 ** caching the encoded transaction on the client side after it was encoded for the first
 ** caching the decoded the transaction in region servers after it has been decoded for the
first time
 * avoid decoding the tx all together, by using a codec that does not require decoding. That
is, instead of binary search in an array of tx ids, some encoding that allows searching directly
on the binary representation. 
 * avoid transmitting the invalid list, A possibility is to rely on the existing TransactionStateCache,
which has knowledge about the invalid transactions in the last snapshot. That could allow
us to only transmit the invalid transactions added since the last snapshot. 

By the way, there is similar overhead in the communication between Transaction Manager and
the client when the transaction is created. That could be another area of improvement.\



> Avoid encoding the transaction multiple times
> ---------------------------------------------
>                 Key: TEPHRA-247
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-247
>             Project: Tephra
>          Issue Type: Improvement
>          Components: core, manager
>    Affects Versions: 0.12.0-incubating
>            Reporter: Andreas Neumann
>            Assignee: Andreas Neumann
>            Priority: Major
>         Attachments: design.jpg
> Currently, the same transaction object is encoded again and again for every Get performed
in HBase. It would be better to cache the encoded transaction for the duration of the transaction
and reuse it, 

This message was sent by Atlassian JIRA

View raw message