cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9304) COPY TO improvements
Date Wed, 21 Oct 2015 09:18:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966519#comment-14966519
] 

Stefania commented on CASSANDRA-9304:
-------------------------------------

The latest changes are ready for review.

In the end I've decided to implement exponential back-off only for server side timeouts, that
is only in the retry policy. For driver timeouts, {{OperationTimedOut}}, it is problematic
to retry because not only do we need to keep track of how many pages we've already received,
but we may also retrieve more data from the server. This results in duplicated data. So what
I did instead is to increase the timeout with the page size (10 seconds per 1000 entries in
the page size at the moment but maybe this is a bit too much). This should eliminate driver
side timeouts that result in more data being received from the server. {{OperationTimedOut}},
if still received, would then signal a real connection problem. In this case, it is the parent
process that may resubmit the same token range later on, up to a maximum number of times and
provided that we have received no data yet. This is true for any errors reported for a range
by a worker process. If we have already received data for that range, I decided against retrying
to avoid duplication of data. I hope this makes sense, let me know if you do have other preferences
on how to implement the back-off and retry mechanism.

I've also done the following:

* enhanced debug messages and error logging 
* fixed COPY command completions
* added monitoring of child processes in case they die without sending the termination flag
on the pipe
* fixed possible concurrent access to {{ExportSession.jobs}}

Still to do:

* Moving the code to a separate file
* Testing on Windows

> COPY TO improvements
> --------------------
>
>                 Key: CASSANDRA-9304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Minor
>              Labels: cqlsh
>             Fix For: 3.x, 2.1.x, 2.2.x
>
>
> COPY FROM has gotten a lot of love.  COPY TO not so much.  One obvious improvement could
be to parallelize reading and writing (write one page of data while fetching the next).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message