incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Decommissioning node is causing broken pipe error
Date Wed, 04 May 2011 08:42:44 GMT
It's no longer recommended to run nodetool compact regularly as it can mean that some tombstones
do not get to be purged for a very long time. Minor compaction is all you need to keep things
in check, however 648 seems like a lot of SSTables. Were some of these compacted files ? How
many SSTables are reported via JConsole for the CF's ?

For the error the receiving side logs the error at DEBUG level when the connection fails.
Not sure how useful it will be but if you set logging to DEBUG for the org.apache.cassandra.net.IncomingStreamReader
logger it may help identify what the receiving end saw when the socket closed.  

It would also be interesting to know if it fails at the exact some place every time. the progress
value in the INFO log messages on the receiving side say how many bytes have been received.


Are the nodes in the same AZ ? same Region ? Anything interesting with the networking ?

Check nodetool ring to see if the node you are trying to decommission still owns it's token.
This will also tell you if the other nodes still see it as leaving. 

I you cannot get the node to decomission you could try shutting it down and using nodetool
removetoken from another node http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely

One other thing, check the data directory on the receiving side. It may still have some partially
written tmp files from the failed streaming. 
 
Hope that helps. 



On 4 May 2011, at 12:29, tamara.alexander@accenture.com wrote:

> Hi all,
>  
> I ran decommission on a node in my 32 node cluster. After about an hour of streaming
files to another node, I got this error on the node being decommissioned:
> INFO [MiscStage:1] 2011-05-03 21:49:00,235 StreamReplyVerbHandler.java (line 58) Need
to re-stream file /raiddrive/MDR/MeterRecords-f-2283-Data.db to /10.206.63.208
> ERROR [Streaming:1] 2011-05-03 21:49:01,580 DebuggableThreadPoolExecutor.java (line 103)
Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.io.IOException: Broken pipe
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Broken pipe
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>         at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:105)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:67)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> ERROR [Streaming:1] 2011-05-03 21:49:01,581 AbstractCassandraDaemon.java (line 112) Fatal
exception in thread Thread[Streaming:1,1,main]
> java.lang.RuntimeException: java.io.IOException: Broken pipe
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Broken pipe
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>         at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:105)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:67)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
>  
> And this message on the node that it was streaming to:
> INFO [Thread-333] 2011-05-03 21:49:00,234 StreamInSession.java (line 121) Streaming of
file /raiddrive/MDR/MeterRecords-f-2283-Data.db/(98605680685,197932763967)
>          progress=49016107008/99327083282 - 49% from org.apache.cassandra.streaming.StreamInSession@33721219
failed: requesting a retry.
>  
> I tried running decommission again (and running scrub + decommission), but I keep getting
this error on the same file.
>  
> I checked out the file and saw that it is a lot bigger than all the other sstables…
184GB instead of about 74MB. I haven’t run a major compaction for a bit, so I’m trying
to stream 658 sstables.
>  
> I’m using Cassandra 0.7.4, I have two data directories (I know that’s not good practice…),
and all my nodes are on Amazon EC2.
>  
> Any thoughts on what could be going on or how to prevent this?
>  
> Thanks!
> Tamara
>  
>  
> 
> This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise private information. If you have received it in error, please notify the sender
immediately and delete the original. Any other use of the email by you is prohibited.


Mime
View raw message