cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5239) Fully Aysnc Server Transport (StorageProxy Layer)
Date Wed, 15 May 2013 11:09:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658247#comment-13658247
] 

Sylvain Lebresne commented on CASSANDRA-5239:
---------------------------------------------

Pushed patch for this at https://github.com/pcmanus/cassandra/compare/5239.

The principle is relatively simple: it makes the StorageProxy method asynchronous, returning
a Guava's ListenableFuture. The choice of Guava's ListenableFuture is motivated by:
* the fact that the Listenable part gives use callback based asynchronicity for the binary
protocol.
* the Future part provides a relatively clean API for part of the code that still want to
block on the result.
* Guava's already has a number of useful methods to compose ListenableFuture, which avoids
too much wheel reinventing.

Yet the patch turns out to be far from small because
# it reorganize most of the SP methods, as the difference phase of said methods now have to
be "chained" through Guava's Futures.transform(),
# it refactores a bit the different response handler (by necessity, but also because I saw
the opportunity to simplify them a bit) and
# it triggers a bunch of boring but required changes in the CQL code that needs to propagate
futures too.

A notable complication is the handling of timeouts. Since we're asynchronous, we're using
a separate timer to trigger the timeout. For that the patch reuses Netty's HashedWheelTimer
since it's designed exactly for that type of I/O timeouts (and we depends on Netty now anyway).
The code for that is in the TimeoutingFuture class.

This patch does however *change how timeouts are working*. I.e. the read/write timeout is
now a timeout for the whole StorageProxy operation. Truth being told, I think it's a good
improvement as this makes imo much more sense from a user perspective (I'm willing to bet
that most user actually believe it's the behavior currently implemented). But as a side note,
I've renamed the cas_contention_timeout to conditional_request_timeout (and change it's default
value to 10s) as it now encompass a full SERIAL query.

A minor addition of the patch is also the addition of new "Truncate", "ConditionalWrite" and
"ConditionalRead" metrics (like we have for "Read", "Write" and "Range"). The reason is that
this was made trivial (it didn't required additional code really) by a refactor made by the
patch. We might merge "ConditionalWrite" and "ConditionalRead" if we want as they have somewhat
similar performance characteristic but I figured it doesn't cost us much to keep them separated.

Tests seems to work fine but I haven't really made any performance testing with this.

                
> Fully Aysnc Server Transport (StorageProxy Layer)
> -------------------------------------------------
>
>                 Key: CASSANDRA-5239
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5239
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 2.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 2.0
>
>
> Problem Statement: 
> Currently we have "rpc_min_threads, rpc_max_threads"/ "native_transport_min_threads/native_transport_max_threads"
all of the threads in the TPE are blocking and takes resources, the threads are mostly sleeping.
Increasing the Context switch costs.
> Details: 
> We should change StorageProxy methods to provide a callback which contains the location
where the results has to be written. When the response arrive StorageProxy callback can write
the results directly into the connection. Timeouts can be handled in the same way.
> Fixing Netty should be trivial with some refactor in the storage proxy (currently it
is one method call for sending the request and waiting) we need callback.
> Fixing Thrift may be harder because thrift calls the method and expects a return value.
We might need to write a custom Codec on Netty for thrift support, which can potentially do
callbacks (A Custom codec may be similar to http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html
but we dont know details about it). Another option is to update thrift to have a callback.
> FYI, The motivation for this ticket is from another project which i am working on with
similar Proxy (blocking Netty transport) and making it Async gave us 2x throughput improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message