incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Large hints column family
Date Fri, 16 Mar 2012 02:24:16 GMT
These messages make it look like the node is having trouble delivering hints. 
> INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 284) Endpoint
/192.168.20.4 died before hint delivery, aborting
> INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 354) Timed
out replaying hints to /192.168.20.3; aborting further deliveries
 
Take another look at the logs on this machine and on 20.4 and 20.3. 

I would be looking int why so many hints are been stored. GC ? are there also logs about dropped
messages ? 

If you want to reset the world, make sure the nodes have all run repair and then drop the
hints. Either via JMX or stopped in the node and deleting the files on disk. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/03/2012, at 12:58 PM, Bryce Godfrey wrote:

> We were having some occasional memory pressure issues, but we just added some more RAM
a few days ago to the nodes and things are running more smoothly now, but in general nodes
have not been going up and down.
>  
> I tried to do a “list HintsColumnFamily” from Cassandra-cli and it locks my Cassandra
node and never returns, forcing me to kill the Cassandra process and restart it to get the
node back.
>  
> Here is my settings which I believe are default since I don’t remember changing them:
>  
> hinted_handoff_enabled: true
> max_hint_window_in_ms: 3600000 # one hour
> hinted_handoff_throttle_delay_in_ms: 50
>  
> Greping for Hinted in system log I get these
> INFO [HintedHandoff:1] 2012-03-13 16:13:22,215 HintedHandOffManager.java (line 373) Finished
hinted handoff of 852703 rows to endpoint /192.168.20.3
> INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 284) Endpoint
/192.168.20.4 died before hint delivery, aborting
> INFO [ScheduledTasks:1] 2012-03-13 16:15:32,569 StatusLogger.java (line 65) HintedHandoff
                    1         1         0
> INFO [HintedHandoff:1] 2012-03-13 16:15:44,362 HintedHandOffManager.java (line 296) Started
hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3
> INFO [HintedHandoff:1] 2012-03-13 16:21:37,266 HintedHandOffManager.java (line 296) Started
hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3
> INFO [ScheduledTasks:1] 2012-03-13 16:23:07,662 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-13 16:25:49,330 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-13 16:30:52,503 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-13 16:42:22,202 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 354) Timed
out replaying hints to /192.168.20.3; aborting further deliveries
> INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 ColumnFamilyStore.java (line 704) Enqueuing
flush of Memtable-HintsColumnFamily@661547256(34298224/74465815 serialized/live bytes, 78808
ops)
> INFO [HintedHandoff:1] 2012-03-13 17:11:00,098 HintedHandOffManager.java (line 373) Finished
hinted handoff of 44160 rows to endpoint /192.168.20.3
> INFO [HintedHandoff:1] 2012-03-13 17:11:36,596 HintedHandOffManager.java (line 296) Started
hinted handoff for token: 56713727820156407428984779325531226112 with IP: /192.168.20.4
> INFO [ScheduledTasks:1] 2012-03-13 17:12:25,248 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [HintedHandoff:1] 2012-03-13 18:47:56,151 HintedHandOffManager.java (line 296) Started
hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3
> INFO [ScheduledTasks:1] 2012-03-13 18:50:24,326 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:12:48,177 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:13:57,685 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:14:57,258 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:14:58,260 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:15:59,093 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:16:59,428 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:18:01,862 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:18:01,898 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:19:04,527 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:19:04,541 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:20:07,712 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [ScheduledTasks:1] 2012-03-14 12:20:08,332 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [HintedHandoff:1] 2012-03-14 12:27:13,033 HintedHandOffManager.java (line 296) Started
hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3
> INFO [ScheduledTasks:1] 2012-03-15 15:05:00,954 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [HintedHandoff:1] 2012-03-15 15:06:07,750 HintedHandOffManager.java (line 354) Timed
out replaying hints to /192.168.20.3; aborting further deliveries
> INFO [ScheduledTasks:1] 2012-03-15 15:06:07,802 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [HintedHandoff:1] 2012-03-15 15:06:07,809 ColumnFamilyStore.java (line 704) Enqueuing
flush of Memtable-HintsColumnFamily@254668880(103911/8312880 serialized/live bytes, 63877
ops)
> INFO [ScheduledTasks:1] 2012-03-15 15:07:13,503 StatusLogger.java (line 65) HintedHandoff
                    1         2         0
> INFO [HintedHandoff:1] 2012-03-15 15:15:43,842 HintedHandOffManager.java (line 296) Started
hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3
>  
>  
> From: aaron morton [mailto:aaron@thelastpickle.com] 
> Sent: Thursday, March 15, 2012 1:51 AM
> To: user@cassandra.apache.org
> Subject: Re: Large hints column family
>  
> Is there anything going on in the logs ? Are nodes going up and down ? Can you see any
messages about delivering hints ? 
>  
> If the query to read the hints errors it will log "HintsCF getEPPendingHints timed out"
at INFO level. 
>  
> Also checking, do the hinted_handoff_*  settings in cassandra.yaml have their default
settings ?
>  
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 15/03/2012, at 8:35 AM, Bryce Godfrey wrote:
> 
> 
> Forgot to mention that this is on 1.0.8
>  
> From: Bryce Godfrey [mailto:Bryce.Godfrey@azaleos.com] 
> Sent: Wednesday, March 14, 2012 12:34 PM
> To: user@cassandra.apache.org
> Subject: Large hints column family
>  
> The system HintsColumnFamily seems large in my cluster, and I want to track down why
that is.  I try invoking “listEndpointsPendingHints()” for o.a.c.db.HintedHandoffManager
and it never returns, and also freezes the node that its invoked against.  It’s a 3 node
cluster, and all nodes have been up and running without issue for a while.  Any help on where
to start with this?
>  
>                Column Family: HintsColumnFamily
>                 SSTable count: 11
>                 Space used (live): 11271669539
>                 Space used (total): 11271669539
>                 Number of Keys (estimate): 1408
>                 Memtable Columns Count: 338
>                 Memtable Data Size: 0
>                 Memtable Switch Count: 1
>                 Read Count: 3
>                 Read Latency: 4354.669 ms.
>                 Write Count: 848
>                 Write Latency: 0.029 ms.
>                 Pending Tasks: 0
>                 Bloom Filter False Postives: 0
>                 Bloom Filter False Ratio: 0.00000
>                 Bloom Filter Space Used: 12656
>                 Key cache capacity: 14
>                 Key cache size: 11
>                 Key cache hit rate: 0.6666666666666666
>                 Row cache: disabled
>                 Compacted row minimum size: 105779
>                 Compacted row maximum size: 7152383774
>                 Compacted row mean size: 590818614
>  
> Thanks,
> Bryce


Mime
View raw message