cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9673) Improve batchlog write path
Date Thu, 16 Jul 2015 07:25:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629357#comment-14629357
] 

Stefania commented on CASSANDRA-9673:
-------------------------------------

I've attached an archive containing .jfr files for trunk and the patched branch. I generated
the files using the following dtest:

{code}
    def logged_batch_stress_test(self):
        """
        @jira_ticket CASSANDRA-9673, a stress test to record any improvements in GC usage
        """
        cluster = self.cluster

        cluster.populate(3)
        cluster.start(wait_other_notice=True, jvm_args=["-XX:+UnlockCommercialFeatures", "-XX:+FlightRecorder"])
        nodes = cluster.nodelist()

        self._start_jfr_recording(nodes)

        nodes[0].stress(['user', 'profile=/home/stefania/git/cstar/9673.yaml', 'ops(insert=1,)',
'n=50000', '-rate', 'threads=8'])

        self._dump_jfr_recording(nodes)

    def _start_jfr_recording(self, nodes):
        """
        Start jfr recording provided the cluster was started with jvm_args=["-XX:+UnlockCommercialFeatures",
"-XX:+FlightRecorder"]
        """
        for node in nodes:
            p = subprocess.Popen(['jcmd', str(node.pid), 'JFR.start'],
                                 stdout=subprocess.PIPE,
                                 stderr=subprocess.PIPE)
            stdout, stderr = p.communicate()
            debug(stdout)
            debug(stderr)

    def _dump_jfr_recording(self, nodes):
        """
        Save jfr recording to file
        """
        for node in nodes:
            p = subprocess.Popen(['jcmd', str(node.pid), 'JFR.dump', 'recording=1', 'filename=recording_{}.jfr'.format(node.address())],
                                 stdout=subprocess.PIPE,
                                 stderr=subprocess.PIPE)
            stdout, stderr = p.communicate()
            debug(stdout)
            debug(stderr)
{code}

9673.yaml is included in the archive attached or available [here|https://dl.dropboxusercontent.com/u/15683245/9673.yaml].
I couldn't figure out any other way to use logged batches in cassandra-stress other than with
a user schema, that is the only reason for the user schema.

I've also run a cperf test with the same stress test:

http://cstar.datastax.com/tests/id/20a0d848-2b84-11e5-be06-42010af0688f

I don't notice any differences in the cperf test and I am not at all familiar with analyzing
.jfr files (first time I use FlightRecorder ever). If anything it seems to me the patched
branch uses less memory but has more GCs but perhaps I should have used a bigger sample. I
also only looked at the coordinator .jfrs (the first node).

[~JoshuaMcKenzie], any suggestions or comments?

> Improve batchlog write path
> ---------------------------
>
>                 Key: CASSANDRA-9673
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9673
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Stefania
>             Fix For: 3.0.0 rc1
>
>         Attachments: 9673.tar.gz
>
>
> Currently we allocate an on-heap {{ByteBuffer}} to serialize the batched mutations into,
before sending it to a distant node, generating unnecessary garbage (potentially a lot of
it).
> With materialized views using the batchlog, it would be nice to optimise the write path:
> - introduce a new verb ({{Batch}})
> - introduce a new message ({{BatchMessage}}) that would encapsulate the mutations, expiration,
and creation time (similar to {{HintMessage}} in CASSANDRA-6230)
> - have MS serialize it directly instead of relying on an intermediate buffer
> To avoid merely shifting the temp buffer to the receiving side(s) we should change the
structure of the batchlog table to use a list or a map of individual mutations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message