cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair
Date Tue, 03 May 2016 05:13:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268120#comment-15268120
] 

Stefania commented on CASSANDRA-9766:
-------------------------------------

It's looking much better without recycling {{BTreeSearchIterator}}:

{code}
grep ERROR build/test/logs/TEST-org.apache.cassandra.streaming.LongStreamingTest.log
ERROR [main] 2016-05-03 10:37:04,004 SLF4J: stderr
ERROR [main] 2016-05-03 10:37:34,737 Writer finished after 25 seconds....
ERROR [main] 2016-05-03 10:37:34,738 File : /tmp/1462243029050-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR [main] 2016-05-03 10:37:55,165 Finished Streaming in 20.41 seconds: 23.52 Mb/sec
ERROR [main] 2016-05-03 10:38:15,054 Finished Streaming in 19.89 seconds: 24.14 Mb/sec
ERROR [main] 2016-05-03 10:38:56,983 Finished Compacting in 41.93 seconds: 23.09 Mb/sec
{code}

I would suggest leaving {{BTreeSearchIterator}} not recycled. I think it is quite dangerous
to recycle this iterator, see for example [here|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-81fd7ce7915c147ea84590e25f77ca47R361].
I think we would extend the scope and risk of this patch significantly for very little gain
but feel free to prove me wrong if you want to experiment with alternative recycling options.


Regarding using our own {{FastThreadLocal}} vs. keeping dependencies to Netty, I'm really
not sure. On one hand I don't want to cause additional work for no good reason and I don't
particularly like duplicating code, but on the other hand the Netty internal classes, e.g.
{{InternalThreadLocalMap}}, could change at any time. So we could have performance regressions
by upgrading Netty for example. I'm happy either way.

Regarding ref. counting, you're quite right we don't need this, if an object is not recycled
it will be GC-ed.

A few more points:

* Why do we need to allocate cells lazily in {{BTreeRow.Builder}}, do we really create many
of these without ever adding cells to them?

* [{{dob.recycle()}}|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-c06541855022eca5fd794dd24ff02f89R182]
should be in a finally since {{serializeRowBody()}} can throw.

* I don't understand [this line|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-ee37e803d70421ce823d42e02620d589R207]:
when the object is recycled, the buffer should be null (from close()) and indexSamplesSerializedSize
should be zero (from create()), so why do we need to set {{indexOffsets\[columnIndexCount\]
= 0}} explicitly?

* {{ColumnIndex.create()}} is only called in BTW.append. It would be nice if we could somehow
attach this object somewhere rather than constantly pushing it and popping it from the recycler
stack. We could just store it in BTW if we could be sure that BTW.append is not called by
multiple threads or maybe have a queue of these objects in BTW?

> Bootstrap outgoing streaming speeds are much slower than during repair
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-9766
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>         Environment: Cassandra 2.1.2. more details in the pdf attached 
>            Reporter: Alexei K
>            Assignee: T Jake Luciani
>              Labels: performance
>             Fix For: 3.x
>
>         Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. What I've
noticed is that we during bootstrap we never go above 12MB/sec transmission speeds and also
those speeds flat line almost like we're hitting some sort of a limit ( this remains true
for other tests that I've ran) however during the repair we see much higher,variable sending
rates. I've provided network charts in the attachment as well . Is there an explanation for
this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message