cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-11053) COPY FROM on large datasets: fix progress report and debug performance
Date Thu, 03 Mar 2016 06:03:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173201#comment-15173201
] 

Stefania edited comment on CASSANDRA-11053 at 3/3/16 6:03 AM:
--------------------------------------------------------------

Thank you for your latest comments and for introducing {{deserializers.DesBytesTypeByteArray}}.

 I've fixed the typo, removed {{LibevConnection}} (as it was already the default as you pointed
out) and added {{DesBytesTypeByteArray}} if available, in [this commit|https://github.com/stef1927/cassandra/commit/2d10a8dc7d369324ecfd5d2457c15cf716243d98].
I've saved the commit history on this [historical branch|https://github.com/stef1927/cassandra/commits/11053-2.1-historical].

I've squashed and up-merged:

||2.1||2.2||2.2 win||3.0||3.5||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/11053-2.1]|[patch|https://github.com/stef1927/cassandra/commits/11053-2.2]|
|[patch|https://github.com/stef1927/cassandra/commits/11053-3.0]|[patch|https://github.com/stef1927/cassandra/commits/11053-3.5]|[patch|https://github.com/stef1927/cassandra/commits/11053]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.2-dtest/]|[win
dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.2-windows-dtest_win32/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-3.5-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-dtest/]|

-No patch merges cleanly, all up-merges have conflicts.-
There are conflicts all the way up to 3.5, only patch to merge cleanly is 3.5 into trunk.

CI is still pending.


was (Author: stefania):
Thank you for your latest comments and for introducing {{deserializers.DesBytesTypeByteArray}}.

 I've fixed the typo, removed {{LibevConnection}} (as it was already the default as you pointed
out) and added {{DesBytesTypeByteArray}} if available, in [this commit|https://github.com/stef1927/cassandra/commit/2d10a8dc7d369324ecfd5d2457c15cf716243d98].
I've saved the commit history on this [historical branch|https://github.com/stef1927/cassandra/commits/11053-2.1-historical].

I've squashed and up-merged:

||2.1||2.2||2.2 win||3.0||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/11053-2.1]|[patch|https://github.com/stef1927/cassandra/commits/11053-2.2]|
|[patch|https://github.com/stef1927/cassandra/commits/11053-3.0]|[patch|https://github.com/stef1927/cassandra/commits/11053]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.2-dtest/]|[win
dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-2.2-windows-dtest_win32/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11053-dtest/]|

No patch merges cleanly, all up-merges have conflicts.

CI is still pending.

> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-11053
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>         Attachments: copy_from_large_benchmark.txt, copy_from_large_benchmark_2.txt,
parent_profile.txt, parent_profile_2.txt, worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed two issues:
> * The progress report is incorrect, it is very slow until almost the end of the test
at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with a smaller
cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages
50,000 rows per second under the same set-up, therefore resulting 1.5 times faster. 
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message