Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 22 Mar 2016 07:52:25 +0000 (UTC)
From: "Stefania (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12948253.1457491162000.14835.1458633145700@Atlassian.JIRA>
In-Reply-To: <JIRA.12948253.1457491162000@Atlassian.JIRA>
References: <JIRA.12948253.1457491162000@Atlassian.JIRA>
 <JIRA.12948253.1457491162574@arcas>
Subject: [jira] [Commented] (CASSANDRA-11320) Improve backoff policy for
 cqlsh COPY FROM
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205957#comment-15205957 ] 

Stefania commented on CASSANDRA-11320:
--------------------------------------

The patch is ready; I am just waiting for the results of some more tests. 

There are 3 new options:

{code}
         MAXINFLIGHTMESSAGES=512  - the maximum number of messages not yet acknowledged by a replica, before the
                                    back-off policy in worker processes kicks in
         MAXBACKOFFATTEMPTS=32    - the maximum number of back-off attempts in worker processes. During each attempt,
                                    if no replica with less than MAXINFLIGHTMESSAGES pending is found, there is a pause 
                                    in the worker process for an amount of time that is drawn at random between 
                                    1 and 2^num-attempts seconds
         MAXPENDINGCHUNKS=24      - the maximum number of chunks not yet read by a working process, once this number
                                    is reached, no new chunks are sent from the feeding process to the worker process
{code}

The default values should be reasonable and users should not need to change them. 

If all replicas have more than MAXINFLIGHTMESSAGES in progress, then a back-off policy is applied in the worker process main thread and it will not send any more messages until at least one replica has fewer in progress messages. The pause becomes exponentially larger. If there are still no replicas after MAXBACKOFFATTEMPTS, a {{NoHostAvailable}} exception is raised. The old back-off policy is removed and on timeouts, we retry as for any other server errors, since the back-off is now performed for all messages.

The feeding process now has a thread for each worker process that sends chunks asynchronously. If there are more than MAXPENDINGCHUNKS, then no chunks are sent. If all worker processes have more than MAXPENDINGCHUNKS in progress, the feeding process sleeps for an amount of time that gets exponentially larger.

The new thread is introduced in {{OneWayChannel}} and will replace the thread introduced by the second patch of CASSANDRA-11053. Generally speaking, it is safer to write into a pipe in a separate thread because if the pipe is full, then the send blocks; there doesn't seem to be an API to determine if the send will block on Windows - other than using inter-process synchronization and I've verified that this is much slower than introducing threads. The performance impact of these threads is of the order of 3-4k rows per second: from 47k rows per second to 44k rows per second on my laptop, when importing 2M entries generated with a standard stress write.


> Improve backoff policy for cqlsh COPY FROM
> ------------------------------------------
>
>                 Key: CASSANDRA-11320
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11320
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>              Labels: doc-impacting
>             Fix For: 3.x
>
>
> Currently we have an exponential back-off policy in COPY FROM that kicks in when timeouts are received. However there are two limitations:
> * it does not cover new requests and therefore we may not back-off sufficiently to give time to an overloaded server to recover
> * the pause is performed in the receiving thread and therefore we may not process server messages quickly enough
> There is a static throttling mechanism in rows per second from feeder to worker processes (the INGESTRATE) but the feeder has no idea of the load of each worker process. However it's easy to keep track of how many chunks a worker process has yet to read by introducing a bounded semaphore.
> The idea is to move the back-off pauses to the worker processes main thread so as to include all messages, new and retries, not just the retries that timed out. The worker process will not read new chunks during the back-off pauses, and the feeder process can then look at the number of pending chunks before sending new chunks to a worker process.
> [~aholmber], [~aweisberg] what do you think?  


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)