cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10344) Optimize ReadResponse
Date Wed, 16 Sep 2015 20:43:45 GMT


Ariel Weisberg commented on CASSANDRA-10344:

Another FYI, there are some utest failures that don't look like flakey tests

I am still reviewing, but the gist of the first commit makes sense to me as an optimization
for the CL.ONE local read case where the response isn't going to go over a network. The diff
for the serializers is hard to read so I am still going through them.

* [Is this a new serialization format that is supposed to be better?|]
* [I don't see why specializing helps?|]
* UnfilteredPartitionIterators.Serializer.deserialize appears to be unused now?
* Why are we creating a BTree if we already have the results? How is that not wasted effort
in at least some cases? This seems like work that wasn't done previously when it just serialized
the contents as they came in. Does it do the conversion to a btree when it process the results
anyways, what about in the remote case?
* I don't really understand the reasoning behind some of the 2nd commit, it looks like it
moving some stuff around, but not doing much else other then switching to a varint for some
* Can you compare your changes CL.ONE and CL.ALL to see what impact they have? I think queries
that return several partitions or medium size results from partition from a small in-memory
data set would be helpful in understanding the impact.

> Optimize ReadResponse
> ---------------------
>                 Key: CASSANDRA-10344
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0.0 rc1
> The handling of {{ReadResponse}} has quite a bit of inefficiencies. The way it works
is based on constraints from early version of CASSANDRA-8099, but this doesn't make sense
anymore. This is particularly true for local response where we fully serialize the response
in memory to deserialize it a short time later.  But
> # serialization/deserialization takes times, more than necessary in that case
> # we serialize in a {{DataInputBuffer}} with a default initial size, which for largish
response might require a few somewhat costly resizing.
> So, since we're materializing the full result in memory anyway, it should quite a lot
more efficient to materialize it in a simple list of {{ImmutableBTreePartition}} in that case.
> To a lesser extend, the serialization of {{ReadResponse}} that go over the wire is probably
not ideal either. Due to current assumptions of {{MessagingService}}, we need to know the
full serialized size of every response upfront, which means we do have to materialize results
in memory in this case too. Currently, we do so by serialializing the full response in memory
first, and then writing that result. Here again, the serialization in memory might require
some resizing/copying, and we're fundamentally copying things twice (this could be especially
costly with largish user values).  So here too I suggest to materialize the result in a list
of {{ImmutableBTreePartition}}, compute the serialized size from it and then serialize it.
This also allow to do better sizing of our data structures on the receiving side.

This message was sent by Atlassian JIRA

View raw message