cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests
Date Sat, 30 Jul 2016 02:48:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400395#comment-15400395
] 

Stefania commented on CASSANDRA-11521:
--------------------------------------

The patch is ready for review:

||trunk|[patch|https://github.com/stef1927/cassandra/commits/11521]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11521-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11521-dtest/]|

There are also the [driver patch|https://github.com/stef1927/java-driver/commits/11521] and
the [spark connector patch|https://github.com/stef1927/spark-cassandra-connector/commits/11521].
For these I plan to create tickets for the respective projects once the native protocol changes
have been finalized.

A [design document|https://docs.google.com/document/d/1YqKGSU1P8EJIfMrO--29VaSoCy5mUu-ePfAiIOLsY7o/edit]
is also available.

The Spark benchmark results are available in [this comment|https://issues.apache.org/jira/browse/CASSANDRA-9259?focusedCommentId=15400394&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15400394]
on the parent ticket. The final patch is slightly better than the proof-of-concept, and the
asynchronous paging mechanism significantly outperforms the existing mechanism for large data
sets.

I've also repeated some cstar_perf tests to rule out performance regressions with ordinary
queries, which are not in the optimized path:

* Single partition queries (default cassandra-stress read command) at CL.LOCAL_ONE (the cassandra-stress
default): [first run|http://cstar.datastax.com/graph?command=one_job&stats=8b1f1d54-53e4-11e6-85af-0256e416528f&metric=99th_latency&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=276.98&ymin=0&ymax=22.33],
[second run with swapped revision's order|http://cstar.datastax.com/graph?command=one_job&stats=1abd3fe4-545e-11e6-8920-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=277.86&ymin=0&ymax=243951.4],
[an old run|http://cstar.datastax.com/graph?command=one_job&stats=16cef080-53dc-11e6-b967-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=282.92&ymin=0&ymax=249571.3]
done before enabling token aware routing in cassandra stress.

* Single partition queries at CL.ALL: [unique run|http://cstar.datastax.com/graph?command=one_job&stats=e2155410-5462-11e6-9cd7-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=277.75&ymin=0&ymax=246123.9]

There is a gap of 3.6K ops/second without token aware routing and 1K with CL=ALL. With token
aware routing the patch is instead 1K ops / second faster. These differences must arise from
the refactoring in select statement. They are very small differences, the test error seems
to be around 0.5K, but I can look into it further if there are concerns. 

> Implement streaming for bulk read requests
> ------------------------------------------
>
>                 Key: CASSANDRA-11521
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>              Labels: client-impacting, protocolv5
>             Fix For: 3.x
>
>         Attachments: final-patch-jfr-profiles-1.zip
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer and eliminating
the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message