Check for other errors about timing out during hint reply. 

  • What would be the best way to recover from this situation?
If they are really causing trouble drop the hints via HintedHandoffManager JMX MBean or stopping the node and deleting the files on disk. Then use repair later. 

  • What can be done to prevent this from happening again?
Hints are stored when either the node is down before the request starts or when the coordinator times out waiting for the remote node. Check the logs for nodes going down, and check the MessagingService MBean for TimedOuts from other nodes. This may indicate issues with a cross DC connection. 

Cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 27/09/2013, at 11:18 PM, Tom van den Berge <tom@drillster.com> wrote:

Hi,

One one of my nodes, the (storage) load increased dramatically (doubled), within one or two hours. The hints column family was causing the growth. I noticed one HintedHandoff process that was started some two hours ago, but hadn't finished. Normally, these processes take only a few seconds, 15 seconds max, in my cluster.

The not-finishing process was handing the hints over to a host in another data center. There were no warning or error messages in the logs, other than the repeated "flushing high-traffic column family hints".
I'm using Cassandra 1.2.3.
  • What can be the reason for the handoff process not to finish?
  • What would be the best way to recover from this situation?
  • What can be done to prevent this from happening again?

Thanks in advance,
Tom