cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9304) COPY TO improvements
Date Tue, 13 Oct 2015 19:32:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955515#comment-14955515
] 

Tyler Hobbs commented on CASSANDRA-9304:
----------------------------------------

The new code comments are very nice, thank you for putting those in.  The rest of the changes
look pretty good to me as well.

bq.  I removed StringIO.StringIO, I believe io.StringIO is equivalent and preferred to StringIO.StringIO?
If so, # noqa doesn't seem to be required, the first E402 is at line 117.

It looks like my version of flake8 was old and didn't handle {{try/except}} style imports
well.  Ignore that comment :)

However, there are a couple of other legit flake8 warnings:

{noformat}
bin/cqlsh|2216| W806 local variable 'ks' is assigned to but never used
bin/cqlsh|2217| W806 local variable 'cf' is assigned to but never used
{noformat}

I also tested this out with a 1m row insert from stress and was surprised to see that I got
some timeouts on multiple ranges.  These were {{OperationTimedOut}} errors, so it's not immediately
clear where the hangup is.  I did notice that the current error handling code loses the original
exception class (which can be useful), so I suggest changing {{err_callback()}} from:

{code}
self.pipe.send((token_range, Exception(err.message)))
{code}

to

{code}
self.pipe.send((token_range, Exception(err.__class__.__name__ + " - " + err.message)))
{code}

To avoid the timeouts, I experimented with lowering the page size from 5k to 1k.  This did
resolve the timeouts for me, and also smoothed the throughput.  I suggest that we lower the
page size (by doing {{session.default_fetch_size = N}}) to 1k just to lower the impact on
nodes.

Additionally, we probably want to add some basic timeout recovery.  The {{err_callback()}}
could perform exponential backoff for a limited number of attempts if an {{OperationTimedOut}}
is thrown.

To handle coordinator-level timeouts, we could subclass {{cassandra.policies.RetryPolicy}}
with an {{on_read_timeout()}} that performs exponential backoff for a limited number of attempts.
 You can pass an instance of this to the Cluster constructor: {{Cluster(..., retry_policy=foo)}}.

Sorry for the additional work, I just don't want to end up with a {{COPY TO}} that goes fast
enough to hit timeouts without any sort of recourse for users.  That's something that we already
have a bit of a problem with for {{COPY FROM}}.

> COPY TO improvements
> --------------------
>
>                 Key: CASSANDRA-9304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Minor
>              Labels: cqlsh
>             Fix For: 3.x, 2.1.x, 2.2.x
>
>
> COPY FROM has gotten a lot of love.  COPY TO not so much.  One obvious improvement could
be to parallelize reading and writing (write one page of data while fetching the next).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message