incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Re: Read Latency
Date Tue, 19 Oct 2010 18:23:29 GMT
It is an entire row which is 600,000 cols. We pass a limit of 10million to
make sure we get it all. Our issue is that it seems Thrift itself has more
overhead/latency added to a read that Cassandra takes itself to do the read.
If cfstats for the slowest node reports 2.25s to us it is not acceptable
that the data comes back to the client in 5.5s. After working with Jonathon
we have optimized Cassandra itself to return the quorum read in 2.7s but we
still have 3s getting lost in the thrift call (fastbinary.decode_binary).

We have seen this pattern totally hold for ms reads as well for a few cols,
but it is easier to look at things in seconds. If Cassandra can get the data
off of the disks in 2.25s we expect to have the data in a Python object in
under 3s. That is a totally realistic expectation from our experience. All
latency needs to be pushed down to disk random read latency as that should
always be what takes the longest. Everything else is passing through memory.


On Tue, Oct 19, 2010 at 2:06 PM, aaron morton <aaron@thelastpickle.com>wrote:

> Wayne,
> I'm calling cassandra from Python and have not seen too many 3 second
> reads.
>
> Your last email with log messages in it looks like your are asking for
> 10,000,000 columns. How much data is this request actually transferring to
> the client? The column names suggest only a few.
>
> DEBUG [pool-1-thread-64] 2010-10-18 19:25:28,867 StorageProxy.java (line
> 471) strongread reading data for SliceFromReadCommand(table='table',
> key='key1', column_parent='QueryPath(columnFamilyName='fact',
> superColumnName='null', columnName='null')', start='503a', finish='503a7c',
> reversed=false, count=10000000) from 698@/x.x.x.6
>
> Aaron
>
> On 20 Oct 2010, at 06:18, Jonathan Ellis wrote:
>
> > I would expect C++ or Java to be substantially faster than Python.
> > However, I note that Hector (and I believe Pelops) don't yet use the
> > newest, fastest Thrift library.
> >
> > On Tue, Oct 19, 2010 at 8:21 AM, Wayne <wav100@gmail.com> wrote:
> >> The changes seems to do the trick. We are down to about 1/2 of the
> original
> >> quorum read performance. I did not see any more errors.
> >>
> >> More than 3 seconds on the client side is still not acceptable to us. We
> >> need the data in Python, but would we be better off going through Java
> or
> >> something else to increase performance? All three seconds are taken up
> in
> >> Thrift itself (fastbinary.decode_binary(self, iprot.trans,
> (self.__class__,
> >> self.thrift_spec))) so I am not sure what other options we have.
> >>
> >> Thanks for your help.
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
>
>

Mime
View raw message