cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Large number of ROW-READ-STAGE pending tasks?
Date Fri, 08 Jan 2010 21:21:12 GMT
if the queued reads is increasing then you're going to OOM eventually,
and it will probably freeze (to the clients' perspective) first while
it desperately tries to GC enough to continue.  i would restart the
affected nodes.

On Fri, Jan 8, 2010 at 3:15 PM, Anthony Molinaro
<anthonym@alumni.caltech.edu> wrote:
> Hi, I had one of my machines fail last night (OOM), and upon restarting it
> about 12 hours later (have to get me some monitoring so I can restart it
> faster), I've noticed lots of errors like
>
> ERROR [pool-1-thread-6915] 2010-01-08 21:10:59,902 Cassandra.java (line 739) Internal
error processing multiget_slice
> java.lang.RuntimeException: error reading key 3cd4e4ba-2fb6-446a-9dc5-96bd6737dddf
>        at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:265)
>        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:312)
>        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
>        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
>        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
>        at org.apache.cassandra.service.CassandraServer.multiget_slice(CassandraServer.java:228)
>        at org.apache.cassandra.service.Cassandra$Processor$multiget_slice.process(Cassandra.java:733)
>        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: java.util.concurrent.TimeoutException: Operation timed out.
>        at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97)
>        at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:261)
>        ... 11 more
>
> On some of the nodes.  Using nodeprobe, I noticed on of the machines
> has a large and growing number of pending tasks
>
> % cassandra-nodeprobe -host xtr-04.mkt -port 8080 tpstats
> FILEUTILS-DELETE-POOL, pending tasks=0
> MESSAGING-SERVICE-POOL, pending tasks=0
> RESPONSE-STAGE, pending tasks=0
> MESSAGE-SERIALIZER-POOL, pending tasks=12
> BOOT-STRAPPER, pending tasks=0
> ROW-READ-STAGE, pending tasks=2006170
> COMMITLOG, pending tasks=8
> MESSAGE-DESERIALIZER-POOL, pending tasks=0
> GMFD, pending tasks=0
> LB-TARGET, pending tasks=0
> CONSISTENCY-MANAGER, pending tasks=1
> ROW-MUTATION-STAGE, pending tasks=130
> MINOR-COMPACTION-POOL, pending tasks=0
> MESSAGE-STREAMING-POOL, pending tasks=0
> LOAD-BALANCER-STAGE, pending tasks=0
> MEMTABLE-FLUSHER-POOL, pending tasks=0
>
> Does this indicate some sort of impending failure?  Would a restart of the
> node or the cluster fix things?  Will it eventually get better or should
> I stop the whole cluster and restart everything (this has worked in the
> past, but requires a bit of work to accomplish).
>
> This is cassandra 0.4.1 BTW.
>
> Thanks,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <anthonym@alumni.caltech.edu>
>

Mime
View raw message