tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Neumann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEPHRA-257) If start() encounters an RPC timeout, an invalid transaction is left behind
Date Thu, 12 Oct 2017 20:12:00 GMT

    [ https://issues.apache.org/jira/browse/TEPHRA-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202550#comment-16202550

Andreas Neumann commented on TEPHRA-257:

It turns out that this cannot be fixed as long as Tephra uses Thrift. Even though we could,
in theory, attempt to modify Thrift's ProcessFunction class: 
    if(!isOneway()) {
      oprot.writeMessageBegin(new TMessage(getMethodName(), TMessageType.REPLY, seqid));
by wrapping this into a try block and catching any socket exceptions. But it turns out that
the flush() does not flush to the socket: due to Thrift's async nature, it flushes to a write
request queue, and the worker thread that performs the write will experience the socket exception.
At that time, we have lost the context and can't have a callback to abort the transaction.

Thus marking this as won't fix. 

> If start() encounters an RPC timeout, an invalid transaction is left behind
> ---------------------------------------------------------------------------
>                 Key: TEPHRA-257
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-257
>             Project: Tephra
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.13.0-incubating
>            Reporter: Andreas Neumann
>            Assignee: Poorna Chandra
> Suppose the following scenario: 
> - a thrift client starts a transaction
> - the server responds, but for whatever reason it is slow 
> - by the time the response is sent, the client has timed out the connection
> - now the server has started a transaction, but the client has no knowledge of it
> - that transaction will never be committed or aborted and eventually times out
> - it becomes an invalid transaction
> This is a common scenario when HDFS is slow and the write load is high. This means, a
lot of change ids have to be written to a slow transaction log. Now we will generate invalid
transactions systematically, which eventually degrades the performance of the entire system.
> It would be good if the server could detect this situation and abort the transaction
immediately. This is safe to do whenever sending of the response fails, because we know that
the client did not receive it, and hence it will not generate data with that transaction id.

> This is a tricky change, though: Thrift does not give us a way to intercept exceptions
from socket failures. We would have to copy a Thrift class (ProcessFunction) and change it
to handle exceptions that occur during the write of the response. 

This message was sent by Atlassian JIRA

View raw message