cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Large number of ROW-READ-STAGE pending tasks?
Date Fri, 08 Jan 2010 22:48:39 GMT
So I restarted the node with the large number of ROW-READ-STAGE pending
tasks, the timeouts are still occuring somewhat randomly, and now
MESSAGE-SERIALIZER-POOL seems to be growing on one of the nodes

% for h in 02 03 04 05 06 07 08 09 ; do echo "xtr-$h.mkt";  cassandra-nodeprobe -host xtr-$h.mkt
-port 8080 tpstats | grep -v tasks=0 ; done
xtr-02.mkt
MINOR-COMPACTION-POOL, pending tasks=5
xtr-03.mkt
ROW-MUTATION-STAGE, pending tasks=3
xtr-04.mkt
MINOR-COMPACTION-POOL, pending tasks=11
xtr-05.mkt
xtr-06.mkt
xtr-07.mkt
MESSAGING-SERVICE-POOL, pending tasks=4
MESSAGE-SERIALIZER-POOL, pending tasks=108
xtr-08.mkt
MESSAGING-SERVICE-POOL, pending tasks=2
MESSAGE-SERIALIZER-POOL, pending tasks=468
ROW-MUTATION-STAGE, pending tasks=2
xtr-09.mkt
ROW-MUTATION-STAGE, pending tasks=1

...So I watched for some amount of time, and the MESSAGE-SERIALIZER-POOL
seems to go up and down quite a bit on just that one box.  Also still
seeing loads of timeouts on many of the boxes.  Still seems like something
might be misbehaving?   Also, since the above the COMPACTION POOL has
reached zero on xtr-04, but not on xtr-02, does that seem odd?

-Anthony

On Fri, Jan 08, 2010 at 03:21:12PM -0600, Jonathan Ellis wrote:
> if the queued reads is increasing then you're going to OOM eventually,
> and it will probably freeze (to the clients' perspective) first while
> it desperately tries to GC enough to continue.  i would restart the
> affected nodes.
> 
> On Fri, Jan 8, 2010 at 3:15 PM, Anthony Molinaro
> <anthonym@alumni.caltech.edu> wrote:
> > Hi, I had one of my machines fail last night (OOM), and upon restarting it
> > about 12 hours later (have to get me some monitoring so I can restart it
> > faster), I've noticed lots of errors like
> >
> > ERROR [pool-1-thread-6915] 2010-01-08 21:10:59,902 Cassandra.java (line 739) Internal
error processing multiget_slice
> > java.lang.RuntimeException: error reading key 3cd4e4ba-2fb6-446a-9dc5-96bd6737dddf
> >        at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:265)
> >        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:312)
> >        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
> >        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
> >        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
> >        at org.apache.cassandra.service.CassandraServer.multiget_slice(CassandraServer.java:228)
> >        at org.apache.cassandra.service.Cassandra$Processor$multiget_slice.process(Cassandra.java:733)
> >        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
> >        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> >        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >        at java.lang.Thread.run(Thread.java:619)
> > Caused by: java.util.concurrent.TimeoutException: Operation timed out.
> >        at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97)
> >        at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:261)
> >        ... 11 more
> >
> > On some of the nodes.  Using nodeprobe, I noticed on of the machines
> > has a large and growing number of pending tasks
> >
> > % cassandra-nodeprobe -host xtr-04.mkt -port 8080 tpstats
> > FILEUTILS-DELETE-POOL, pending tasks=0
> > MESSAGING-SERVICE-POOL, pending tasks=0
> > RESPONSE-STAGE, pending tasks=0
> > MESSAGE-SERIALIZER-POOL, pending tasks=12
> > BOOT-STRAPPER, pending tasks=0
> > ROW-READ-STAGE, pending tasks=2006170
> > COMMITLOG, pending tasks=8
> > MESSAGE-DESERIALIZER-POOL, pending tasks=0
> > GMFD, pending tasks=0
> > LB-TARGET, pending tasks=0
> > CONSISTENCY-MANAGER, pending tasks=1
> > ROW-MUTATION-STAGE, pending tasks=130
> > MINOR-COMPACTION-POOL, pending tasks=0
> > MESSAGE-STREAMING-POOL, pending tasks=0
> > LOAD-BALANCER-STAGE, pending tasks=0
> > MEMTABLE-FLUSHER-POOL, pending tasks=0
> >
> > Does this indicate some sort of impending failure?  Would a restart of the
> > node or the cluster fix things?  Will it eventually get better or should
> > I stop the whole cluster and restart everything (this has worked in the
> > past, but requires a bit of work to accomplish).
> >
> > This is cassandra 0.4.1 BTW.
> >
> > Thanks,
> >
> > -Anthony
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <anthonym@alumni.caltech.edu>
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message