cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John R. Frank (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-5575) permanent client failures: attempting batch_mutate on data that serializes to more than thrift_framed_transport_size_in_mb fails forever
Date Fri, 17 May 2013 03:05:16 GMT
John R. Frank created CASSANDRA-5575:
----------------------------------------

             Summary: permanent client failures:  attempting batch_mutate on data that serializes
to more than thrift_framed_transport_size_in_mb fails forever 
                 Key: CASSANDRA-5575
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5575
             Project: Cassandra
          Issue Type: Bug
            Reporter: John R. Frank


Since batch_mutate is a thrift interface, it unifies all of the data in a batch into a single
thrift message.  This means that clients cannot easily predict whether a batch will exceed
thrift_framed_transport_size_in_mb

Thrift's client libraries do not yet raise an exception on exceeding the frame size:
https://issues.apache.org/jira/browse/THRIFT-1324 

So, Cassandra clients are doomed to the infinite loop illustrated here: 
http://mail-archives.apache.org/mod_mbox/cassandra-user/201305.mbox/%3Calpine.DEB.2.00.1305101202190.25200@computableinsights.com%3E


I still don't understand why Cassandra has both of these parameters -- the second parameter
appears to be superfluous:
{code:borderStyle=solid}
# Frame size for thrift (maximum field length).
thrift_framed_transport_size_in_mb: 1500

# The max length of a thrift message, including all fields and
# internal thrift overhead.
thrift_max_message_length_in_mb: 1600
{code}

(Note the monsterous message sizes we are now using to avoid zoombie clients; This is clearly
too brittle to go into production.  Is Cassandra really only for small batches?)

Possible solutions:

1) fix Thrift and catch the error inside all the Cassandra clients and subdivide the batch
and raise a further error if an individual message is too large.

2) change batch_mutate to serialize each mutation separately and assemble the messages into
a thrift transmission controlled more directly by the client

3) plan the end-of-life of the Thrift interfaces to Cassandra and replace them with something
else -- the new "binary streaming" protocol we've been hearing about?

Other ideas?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message