cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Tolbert <andrew.tolb...@datastax.com>
Subject Re: Does Java driver v3.1.x degrade cluster connect/close performance?
Date Tue, 07 Mar 2017 14:13:59 GMT
Hi Satoshi,

One correction on my previous email, at 2.1.8 of the driver, Netty 4.0 was
>> in use, so please disregard my comments about the netty dependency changing
>> from 3.9 to 4.0, there is a different in version, but it's only at the
>> patch level (4.0.27 to 4.0.37)
>>
>
Does your comment mean with 2.1.8 of the driver takes at least 2 seconds at
> Cluster#close? If so, it is strange because the response time of
> Cluster#close was around 20ms with v2.1.8 of the driver in my test.
>

The real reason for the two second delay was the change made for JAVA-914
<https://datastax-oss.atlassian.net/browse/JAVA-914> which was introduced
in 2.1.9 and 3.0.x, not the netty 3.9 to 4.0 version change which I was
incorrect about as that change was made earlier (at driver 2.1.6).

Thanks,
Andy


On Mon, Mar 6, 2017 at 11:11 PM, Satoshi Hikida <sahikida@gmail.com> wrote:

> Hi Matija, Andrew
>
> Thank you for your reply.
>
> Matija:
> > Do you plan to misuse it and create a new cluster object and open a new
> connection for each request?
> No, My app never create a new cluster for each request. Meanwhile its each
> unit tests creates a new cluster and close it every time.
> Of course I can change the creating and closing a cluster to at once or a
> few times in the test. But I just wondered why the connection/close
> performance is degraded if I update the driver version.
>
>
> Andrew:
> Thanks for your information about driver's ML. I'll use it from next time.
>
> > One correction on my previous email, at 2.1.8 of the driver, Netty 4.0
> was in use, so please disregard my comments about the netty dependency
> changing from 3.9 to 4.0, there is a different in version, but it's only at
> the patch level (4.0.27 to 4.0.37)
> Does your comment mean with 2.1.8 of the driver takes at least 2 seconds
> at Cluster#close? If so, it is strange because the response time of
> Cluster#close was around 20ms with v2.1.8 of the driver in my test.
>
> > I'd be interested to see if running the same test in your environment
> creates different results.
> I'll run the test in my test environment and share the result. Thank you
> again.
>
> Regards,
> Satoshi
>
> On Tue, Mar 7, 2017 at 12:38 AM, Andrew Tolbert <
> andrew.tolbert@datastax.com> wrote:
>
>> One correction on my previous email, at 2.1.8 of the driver, Netty 4.0
>> was in use, so please disregard my comments about the netty dependency
>> changing from 3.9 to 4.0, there is a different in version, but it's only at
>> the patch level (4.0.27 to 4.0.37)
>>
>> Just to double check, I reran that connection initialization test (source
>> <https://gist.github.com/tolbertam/e6ac8b71a7703a6fc4561356767a1501>)
>> where I got my previous numbers from (as that was from nearly 2 years ago)
>> and compared driver version 2.1.8 against 3.1.3.  I first ran against a
>> single node that is located in California, where my client is in Minnesota,
>> so roundtrip latency is a factor:
>>
>> v2.1.8:
>>
>> Single attempt took 1837ms.
>>
>> 10 warmup iterations (first 10 attempts discarded), 100 trials
>>
>>
>> -- Timers ------------------------------------------------------------
>> ----------
>> connectTimer
>>              count = 100
>>                min = 458.40 milliseconds
>>                max = 769.43 milliseconds
>>               mean = 493.45 milliseconds
>>             stddev = 38.54 milliseconds
>>             median = 488.38 milliseconds
>>               75% <= 495.71 milliseconds
>>               95% <= 514.73 milliseconds
>>               98% <= 724.05 milliseconds
>>               99% <= 769.02 milliseconds
>>             99.9% <= 769.43 milliseconds
>>
>> v3.1.3:
>>
>> Single attempt took 1781ms.
>>
>> 10 warmup iterations (first 10 attempts discarded), 100 trials
>>
>>  -- Timers ------------------------------------------------------------
>> ----------
>> connectTimer
>>              count = 100
>>                min = 457.32 milliseconds
>>                max = 539.77 milliseconds
>>               mean = 485.68 milliseconds
>>             stddev = 10.76 milliseconds
>>             median = 485.52 milliseconds
>>               75% <= 490.39 milliseconds
>>               95% <= 499.83 milliseconds
>>               98% <= 511.52 milliseconds
>>               99% <= 535.56 milliseconds
>>             99.9% <= 539.77 milliseconds
>>
>> As you can see, at least for this test, initialization times are pretty
>> much identical.
>>
>> I ran another set of trials using a local C* node (running on same host
>> as client) to limit the impact of round trip time:
>>
>> v2.1.8:
>>
>> Single attempt took 477ms.
>>
>> 10 warmup iterations 100 trials
>>
>> -- Timers ------------------------------------------------------------
>> ----------
>> connectTimer
>>              count = 100
>>                min = 2.38 milliseconds
>>                max = 32.69 milliseconds
>>               mean = 3.79 milliseconds
>>             stddev = 3.49 milliseconds
>>             median = 3.05 milliseconds
>>               75% <= 3.49 milliseconds
>>               95% <= 6.05 milliseconds
>>               98% <= 19.55 milliseconds
>>               99% <= 32.56 milliseconds
>>             99.9% <= 32.69 milliseconds
>>
>> v3.1.3:
>>
>> Single attempt took 516ms.
>>
>> -- Timers ------------------------------------------------------------
>> ----------
>> connectTimer
>>              count = 100
>>                min = 1.67 milliseconds
>>                max = 8.03 milliseconds
>>               mean = 3.00 milliseconds
>>             stddev = 0.97 milliseconds
>>             median = 2.85 milliseconds
>>               75% <= 3.10 milliseconds
>>               95% <= 4.01 milliseconds
>>               98% <= 6.55 milliseconds
>>               99% <= 7.93 milliseconds
>>             99.9% <= 8.03 milliseconds
>>
>> Similary when using a local C* node, initialization times are pretty
>> similar.
>>
>> I'd be interested to see if running the same test
>> <https://gist.github.com/tolbertam/e6ac8b71a7703a6fc4561356767a1501> in
>> your environment creates different results.
>>
>> Thanks!
>> Andy
>>
>>
>> On Mon, Mar 6, 2017 at 8:53 AM, Andrew Tolbert <
>> andrew.tolbert@datastax.com> wrote:
>>
>>> Hi Satoshi,
>>>
>>> This question would be better for the 'DataStax Java Driver for Apache
>>> Cassandra mailing list
>>> <https://groups.google.com/a/lists.datastax.com/forum/#%21forum/java-driver-user>',
>>> but I do have a few thoughts about what you are observing:
>>>
>>> Between java-driver 2.1 and 3.0 the driver updated its Netty dependency
>>> from 3.9.x to 4.0.x.  Cluster#close is likely taking two seconds longer
>>> because the driver uses AbstractEventExecutor.shutdownGracefully()
>>> <https://github.com/netty/netty/blob/netty-4.0.44.Final/common/src/main/java/io/netty/util/concurrent/AbstractEventExecutor.java#L50>
>>> which waits for a quiet period of 2 seconds to allow any inflight requests
>>> to complete.  You can disable that by passing a custom NettyOptions
>>> <http://docs.datastax.com/en/drivers/java/3.1/com/datastax/driver/core/NettyOptions.html>
>>> to a Cluster.Builder using withNettyOptions, i.e.:
>>>
>>>     /**
>>>      * A custom {@link NettyOptions} that shuts down the {@link
>>> EventLoopGroup} after
>>>      * no quiet time.  This is useful for tests that consistently close
>>> clusters as
>>>      * otherwise there is a 2 second delay (from JAVA-914
>>> <https://datastax-oss.atlassian.net/browse/JAVA-914>).
>>>      */
>>>     public static NettyOptions nonQuietClusterCloseOptions = new
>>> NettyOptions() {
>>>         @Override
>>>         public void onClusterClose(EventLoopGroup eventLoopGroup) {
>>>             eventLoopGroup.shutdownGracefully(0, 15,
>>> SECONDS).syncUninterruptibly();
>>>         }
>>>     };
>>>
>>> However, I wouldn't recommend doing this unless you have a requirement
>>> for Cluster.close to be as qiuck as possible, as after all closing a
>>> Cluster frequently is not something you should expect to be doing often as
>>> a Cluster and it's Session are meant to be reused over the lifetime of an
>>> application.
>>>
>>> With regards to Cluster.connect being slower, i'm not sure I have an
>>> explanation for that and it is not something I have noticed.  I would not
>>> expect Cluster.connect to even take a second with a single node cluster
>>> (for example, I recorded some numbers
>>> <https://datastax-oss.atlassian.net/browse/JAVA-692?focusedCommentId=21428&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-21428>
>>> a while back and mean initialization time with a 40 node cluster with auth
>>> was ~251ms).  Have you tried executing several trials of Cluster.connect
>>> within a single JVM process, does the initialization time improve with a
>>> subsequent Cluster.connect?  I'm wondering if maybe there is some
>>> additional first-time initialization required that was not before.
>>>
>>> Thanks,
>>> Andy
>>>
>>> On Mon, Mar 6, 2017 at 6:01 AM, Matija Gobec <matija0204@gmail.com>
>>> wrote:
>>>
>>>> Interesting question since I never measured connect and close times.
>>>> Usually this is something you do once the application starts and thats it.
>>>> Do you plan to misuse it and create a new cluster object and open a new
>>>> connection for each request?
>>>>
>>>> On Mon, Mar 6, 2017 at 7:19 AM, Satoshi Hikida <sahikida@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm going to try to update the DataStax's Java Driver version from
>>>>> 2.1.8 to 3.1.3.
>>>>> First I ran the test program and measured the time with both drivers
>>>>> v2.1.8 and v3.1.3.
>>>>>
>>>>> The test program is simply Build a Cluster and connect to it and
>>>>> execute a simple select statement, and close the Cluster.
>>>>>
>>>>> The read performance was almost the same for both version (around
>>>>> 20ms), However, the performance of connecting to the cluster, and closing
>>>>> the cluster were significant different.
>>>>>
>>>>> The test environment is as following:
>>>>> - EC2 instance: m4.large(2vCPU, 8GB Memory), 1 node
>>>>> - java1.8
>>>>> - Cassandra v2.2.8
>>>>>
>>>>> Here is the result of the test. I ran the test program for several
>>>>> times but the result almost the same as this result.
>>>>>
>>>>> | Method               | Time in sec (v2.1.8/v3.1.3)|
>>>>> |-----------------------|------------------------------------|
>>>>> | Cluster#connect |                       1.178/2.468 |
>>>>> | Cluster#close     |                       0.022/2.240 |
>>>>>
>>>>> With v3.1.3 driver, Cluster#connect() performance degraded about 1/2
>>>>> and Cluster#close() degraded 1/100.  I want to know what is the cause
of
>>>>> this performance degradations. Could someone advice me?
>>>>>
>>>>>
>>>>> The Snippet of the test program is as following.
>>>>> ```
>>>>> Cluster cluster = Cluster
>>>>>     .builder()
>>>>>     .addContactPoints(endpoints)
>>>>>     .withCredentials(USER, PASS)
>>>>>     .withClusterName(CLUSTER_NAME)
>>>>>     .withRetryPolicy(DefaultRetryPolicy.INSTANCE)
>>>>>     // .withLoadBalancingPolicy(new TokenAwarePolicy(new
>>>>> DCAwareRoundRobinPolicy(DC_NAME))) // for driver 2.1.8
>>>>>     .withLoadBalancingPolicy(new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().build()))
>>>>> // for driver 3.1.3
>>>>>     .build();
>>>>>
>>>>> Session session = cluster.connect();
>>>>> ResultSet rs = session.execute("select * from system.local;");
>>>>>
>>>>> session.close();
>>>>> cluster.close();
>>>>> ```
>>>>>
>>>>> Regards,
>>>>> Satoshi
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message