incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmond Lau <edm...@ooyala.com>
Subject Re: repeated timeouts on quorum reads
Date Tue, 20 Oct 2009 01:20:15 GMT
On Mon, Oct 19, 2009 at 6:01 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> are there many rows like this?

No - just a handful.  I'm able to repro by just launching 5 or 6
threads that all read the same key.

>
> did you check the logs on the other nodes for exceptions?

Yes - no exceptions.

I did find one point of note though.  Most of the quorum reads fail
with 0 or 1 responses, but I see some that fail with 2 responses,
which is odd given that only 2 responses were needed:

ERROR [pool-1-thread-18] 2009-10-20 01:17:26,266 Cassandra.java (line
679) Internal error processing get_slice
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out - received only 2 responses from
172.16.129.75:7000172.16.129.72:7000 .
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException: Operation timed out
- received only 2 responses from 172.16.129.75:7000172.16.129.72:7000
.
        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more

>
> On Mon, Oct 19, 2009 at 7:40 PM, Edmond Lau <edmond@ooyala.com> wrote:
>> Usually I'm trying to read 500 columns (~250KB) out of the 30K columns
>> (~15MB) of the supercolumn.  But the same issues happen when I drop
>> down to 100 (~50KB) columns.  The columns I request from get_slice()
>> are named, i.e. I'm not reading 500 consecutive columns.
>>
>> Edmond
>>
>> On Mon, Oct 19, 2009 at 5:36 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> How much of the row that fails are you trying to read at once?
>>>
>>> On Mon, Oct 19, 2009 at 7:30 PM, Edmond Lau <edmond@ooyala.com> wrote:
>>>> Whenever I try to do a quorum read on a row with a particularly large
>>>> supercolumn with get_slice under high load, cassandra throws timeouts.
>>>>  The reads for that row repeatedly fail until load decreases, but
>>>> smaller reads still succeed during that time.  bin/nodeprobe info
>>>> shows that the read latency for the column family spikes to up 6-8
>>>> seconds.  I've run into this issue since I started to play with
>>>> cassandra, but thought that it might go away with beefier nodes.  I've
>>>> since gotten more powerful machines, but the timeouts still happen.
>>>>
>>>> Some details:
>>>> - cassandra 0.4.1
>>>> - 5 nodes, each with 12-core 800MHz with 8GB RAM, 5GB heap size
>>>> - replication factor of 3
>>>> - RandomPartitioner
>>>> - row that fails has a supercolumn with ~30K subcolumns, ~500 bytes
>>>> per cell, ~15MB total
>>>> - my failed quorum read lists 500 columns to read in the get_slice
>>>> call, but the same happens even when I read 100.
>>>>
>>>> The nodes either timeout with 0 or 1 responses (2 of 3 required for a
>>>> quorum read):
>>>>
>>>> ERROR [pool-1-thread-24] 2009-10-20 00:07:43,851 Cassandra.java (line
>>>> 679) Internal error processing get_slice
>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>>>> Operation timed out - received only 0 responses from  .
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>        at java.lang.Thread.run(Thread.java:619)
>>>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>>>> - received only 0 responses from  .
>>>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>>>        ... 9 more
>>>>
>>>> ERROR [pool-1-thread-32] 2009-10-19 23:47:21,045 Cassandra.java (line
>>>> 679) Internal error processing get_slice
>>>> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
>>>> Operation timed out - received only 1 responses from
>>>> 172.16.129.75:7000 .
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108)
>>>>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>>>>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>>>>        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
>>>>        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
>>>>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>        at java.lang.Thread.run(Thread.java:619)
>>>> Caused by: java.util.concurrent.TimeoutException: Operation timed out
>>>> - received only 1 responses from 172.16.129.75:7000 .
>>>>        at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>>        at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395)
>>>>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317)
>>>>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>>>>        ... 9 more
>>>>
>>>> Any ideas what the issue might be?
>>>>
>>>> Edmond
>>>>
>>>
>>
>

Mime
View raw message