We were having some occasional memory pressure issues, but we just added some more RAM a few days ago to the nodes and things are running more smoothly now, but in general nodes have not been going up and down.

 

I tried to do a “list HintsColumnFamily” from Cassandra-cli and it locks my Cassandra node and never returns, forcing me to kill the Cassandra process and restart it to get the node back.

 

Here is my settings which I believe are default since I don’t remember changing them:

 

hinted_handoff_enabled: true

max_hint_window_in_ms: 3600000 # one hour

hinted_handoff_throttle_delay_in_ms: 50

 

Greping for Hinted in system log I get these

INFO [HintedHandoff:1] 2012-03-13 16:13:22,215 HintedHandOffManager.java (line 373) Finished hinted handoff of 852703 rows to endpoint /192.168.20.3

INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 284) Endpoint /192.168.20.4 died before hint delivery, aborting

INFO [ScheduledTasks:1] 2012-03-13 16:15:32,569 StatusLogger.java (line 65) HintedHandoff                     1         1         0

INFO [HintedHandoff:1] 2012-03-13 16:15:44,362 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3

INFO [HintedHandoff:1] 2012-03-13 16:21:37,266 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3

INFO [ScheduledTasks:1] 2012-03-13 16:23:07,662 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-13 16:25:49,330 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-13 16:30:52,503 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-13 16:42:22,202 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 354) Timed out replaying hints to /192.168.20.3; aborting further deliveries

INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-HintsColumnFamily@661547256(34298224/74465815 serialized/live bytes, 78808 ops)

INFO [HintedHandoff:1] 2012-03-13 17:11:00,098 HintedHandOffManager.java (line 373) Finished hinted handoff of 44160 rows to endpoint /192.168.20.3

INFO [HintedHandoff:1] 2012-03-13 17:11:36,596 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /192.168.20.4

INFO [ScheduledTasks:1] 2012-03-13 17:12:25,248 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [HintedHandoff:1] 2012-03-13 18:47:56,151 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3

INFO [ScheduledTasks:1] 2012-03-13 18:50:24,326 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:12:48,177 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:13:57,685 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:14:57,258 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:14:58,260 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:15:59,093 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:16:59,428 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:18:01,862 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:18:01,898 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:19:04,527 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:19:04,541 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:20:07,712 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [ScheduledTasks:1] 2012-03-14 12:20:08,332 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [HintedHandoff:1] 2012-03-14 12:27:13,033 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3

INFO [ScheduledTasks:1] 2012-03-15 15:05:00,954 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [HintedHandoff:1] 2012-03-15 15:06:07,750 HintedHandOffManager.java (line 354) Timed out replaying hints to /192.168.20.3; aborting further deliveries

INFO [ScheduledTasks:1] 2012-03-15 15:06:07,802 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [HintedHandoff:1] 2012-03-15 15:06:07,809 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-HintsColumnFamily@254668880(103911/8312880 serialized/live bytes, 63877 ops)

INFO [ScheduledTasks:1] 2012-03-15 15:07:13,503 StatusLogger.java (line 65) HintedHandoff                     1         2         0

INFO [HintedHandoff:1] 2012-03-15 15:15:43,842 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3

 

 

From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Thursday, March 15, 2012 1:51 AM
To:
user@cassandra.apache.org
Subject: Re: Large hints column family

 

Is there anything going on in the logs ? Are nodes going up and down ? Can you see any messages about delivering hints ? 

 

If the query to read the hints errors it will log "HintsCF getEPPendingHints timed out" at INFO level. 

 

Also checking, do the hinted_handoff_*  settings in cassandra.yaml have their default settings ?

 

Cheers

 

-----------------

Aaron Morton

Freelance Developer

@aaronmorton

 

On 15/03/2012, at 8:35 AM, Bryce Godfrey wrote:



Forgot to mention that this is on 1.0.8

 

From: Bryce Godfrey [mailto:Bryce.Godfrey@azaleos.com] 
Sent: Wednesday, March 14, 2012 12:34 PM
To: 
user@cassandra.apache.org
Subject: Large hints column family

 

The system HintsColumnFamily seems large in my cluster, and I want to track down why that is.  I try invoking “listEndpointsPendingHints()” for o.a.c.db.HintedHandoffManager and it never returns, and also freezes the node that its invoked against.  It’s a 3 node cluster, and all nodes have been up and running without issue for a while.  Any help on where to start with this?

 

               Column Family: HintsColumnFamily

                SSTable count: 11

                Space used (live): 11271669539

                Space used (total): 11271669539

                Number of Keys (estimate): 1408

                Memtable Columns Count: 338

                Memtable Data Size: 0

                Memtable Switch Count: 1

                Read Count: 3

                Read Latency: 4354.669 ms.

                Write Count: 848

                Write Latency: 0.029 ms.

                Pending Tasks: 0

                Bloom Filter False Postives: 0

                Bloom Filter False Ratio: 0.00000

                Bloom Filter Space Used: 12656

                Key cache capacity: 14

                Key cache size: 11

                Key cache hit rate: 0.6666666666666666

                Row cache: disabled

                Compacted row minimum size: 105779

                Compacted row maximum size: 7152383774

                Compacted row mean size: 590818614

 

Thanks,

Bryce