Our problem is not that Python is slow, our problem is that getting data from the Cassandra server is slow (while Cassandra itself is fast). Python can handle the result data a lot faster that whatever is it passing through now...
I guess to ask a specific question what right now is the fastest mechanism in terms of latency to get data from Cassandra to a client application? I assume it is Java? We would not use any higher level library and prefer to go directly against thrift (whatever is the fastest method). We can easily write our own C++ layer but if C++ still has to go through Thrift and thrift is our problem we have solved nothing. To us this appears much more as a maturity/optimization problem in thrift than anything to do with language benefits.
Given our entire wait is on a call to Thrift below I tend to think nothing we do (in any language) will help except making optimizations to Thrift or Avro?
Thanks for the help!
I would expect C++ or Java to be substantially faster than Python.
However, I note that Hector (and I believe Pelops) don't yet use the
newest, fastest Thrift library.
On Tue, Oct 19, 2010 at 8:21 AM, Wayne <email@example.com> wrote:
> The changes seems to do the trick. We are down to about 1/2 of the original
> quorum read performance. I did not see any more errors.
> More than 3 seconds on the client side is still not acceptable to us. We
> need the data in Python, but would we be better off going through Java or
> something else to increase performance? All three seconds are taken up in
> Thrift itself (fastbinary.decode_binary(self, iprot.trans, (self.__class__,
> self.thrift_spec))) so I am not sure what other options we have.
> Thanks for your help.
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support