As a follow up to this conversation; we are still having issues with our
Cassandra cluster on EC2.
It *looks* to be related to Garbage Collection; however we aren't sure what
the root cause of the problem is. Here is an extract from logs:
INFO [GMFD:1] 2010-09-20 15:22:00,242 Gossiper.java (line 578) InetAddress
/10.102.57.197 is now UP
INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:22:00,242
HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /
10.102.57.197
INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:22:00,247
HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
endpoint /10.102.57.197
INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC
for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is
25907560448
INFO [WRITE-/10.102.7.187] 2010-09-20 15:27:42,052
OutboundTcpConnection.java (line 105) error writing to /10.102.7.187
INFO [GC inspection] 2010-09-20 15:27:42,082 GCInspector.java (line 150)
Pool Name Active Pending
INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156)
STREAM-STAGE 0 0
INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156)
RESPONSE-STAGE 0 3
INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156)
ROW-READ-STAGE 6 25
INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
LB-OPERATIONS 0 0
INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
MISCELLANEOUS-POOL 0 0
INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
GMFD 1 129
INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
CONSISTENCY-MANAGER 0 0
INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156)
LB-TARGET 0 0
INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156)
ROW-MUTATION-STAGE 1 1
INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156)
MESSAGE-STREAMING-POOL 0 0
INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
LOAD-BALANCER-STAGE 0 0
INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
FLUSH-SORTER-POOL 0 0
INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
MEMTABLE-POST-FLUSHER 0 0
INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
AE-SERVICE-STAGE 0 0
INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 156)
FLUSH-WRITER-POOL 0 0
INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 156)
HINTED-HANDOFF-POOL 0 0
INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 161)
CompactionManager n/a 0
INFO [GMFD:1] 2010-09-20 15:27:42,088 Gossiper.java (line 592) Node /
10.102.7.187 has restarted, now UP again
INFO [GMFD:1] 2010-09-20 15:27:42,089 StorageService.java (line 548) Node /
10.102.7.187 state jump to normal
INFO [GMFD:1] 2010-09-20 15:27:42,089 StorageService.java (line 555) Will
not change my token ownership to /10.102.7.187
INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:27:42,089
HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /
10.102.7.187
INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:27:42,112
HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
endpoint /10.102.7.187
INFO [WRITE-/10.102.7.187] 2010-09-20 15:27:42,389
OutboundTcpConnection.java (line 105) error writing to /10.102.7.187
INFO [WRITE-/10.102.57.197] 2010-09-20 15:34:15,911
OutboundTcpConnection.java (line 105) error writing to /10.102.57.197
INFO [GC inspection] 2010-09-20 15:34:15,924 GCInspector.java (line 129) GC
for ParNew: 372272 ms, 82471240 reclaimed leaving 671616240 used; max is
25907560448
INFO [GC inspection] 2010-09-20 15:34:15,925 GCInspector.java (line 150)
Pool Name Active Pending
INFO [GC inspection] 2010-09-20 15:34:15,925 GCInspector.java (line 156)
STREAM-STAGE 0 0
INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156)
RESPONSE-STAGE 0 21
INFO [Timer-0] 2010-09-20 15:34:15,926 Gossiper.java (line 180) InetAddress
/10.102.7.187 is now dead.
INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156)
ROW-READ-STAGE 1 85
INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156)
LB-OPERATIONS 0 0
INFO [GC inspection] 2010-09-20 15:34:15,935 GCInspector.java (line 156)
MISCELLANEOUS-POOL 0 0
INFO [GC inspection] 2010-09-20 15:34:15,935 GCInspector.java (line 156)
GMFD 1 548
INFO [GC inspection] 2010-09-20 15:34:15,935 GCInspector.java (line 156)
CONSISTENCY-MANAGER 0 0
INFO [GC inspection] 2010-09-20 15:34:15,936 GCInspector.java (line 156)
LB-TARGET 0 0
INFO [GC inspection] 2010-09-20 15:34:15,936 GCInspector.java (line 156)
ROW-MUTATION-STAGE 0 0
INFO [GC inspection] 2010-09-20 15:34:15,936 GCInspector.java (line 156)
MESSAGE-STREAMING-POOL 0 0
INFO [GC inspection] 2010-09-20 15:34:15,937 GCInspector.java (line 156)
LOAD-BALANCER-STAGE 0 0
INFO [GC inspection] 2010-09-20 15:34:15,937 GCInspector.java (line 156)
FLUSH-SORTER-POOL 0 0
INFO [GC inspection] 2010-09-20 15:34:15,937 GCInspector.java (line 156)
MEMTABLE-POST-FLUSHER 0 0
INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 156)
AE-SERVICE-STAGE 0 0
INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 156)
FLUSH-WRITER-POOL 0 0
INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 156)
HINTED-HANDOFF-POOL 0 0
INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 161)
CompactionManager n/a 0
INFO [WRITE-/10.102.57.197] 2010-09-20 15:34:15,931
OutboundTcpConnection.java (line 105) error writing to /10.102.57.197
INFO [GMFD:1] 2010-09-20 15:34:15,960 Gossiper.java (line 578) InetAddress
/10.102.7.187 is now UP
INFO [GMFD:1] 2010-09-20 15:34:15,961 Gossiper.java (line 592) Node /
10.102.57.197 has restarted, now UP again
INFO [GMFD:1] 2010-09-20 15:34:15,961 StorageService.java (line 548) Node /
10.102.57.197 state jump to normal
INFO [GMFD:1] 2010-09-20 15:34:15,961 StorageService.java (line 555) Will
not change my token ownership to /10.102.57.197
INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:34:15,961
HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /
10.102.7.187
Current thoughts - launching a GC for ParNew at 15:27:42, which takes 325411
ms, followed (almost immediately) by another at 15:34:15 which takes 372272
ms is probably sub-optimal.
INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC
for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is
25907560448
INFO [GC inspection] 2010-09-20 15:34:15,924 GCInspector.java (line 129) GC
for ParNew: 372272 ms, 82471240 reclaimed leaving 671616240 used; max is
25907560448
Can anyone help shed any light on why this might be happening? We've tried a
variety of JVM settings to alleviate this; currently with no luck.
Dave
On 17 September 2010 17:04, Jedd Rashbrooke <jedd.rashbrooke@imagini.net>wrote:
> Hi Rob,
>
> Thanks for your suggestions. I should have been a bit more verbose
> in my platform description -- I'm using 64-bit instances, which I think
> in a Ben Black video I saw led to a sensible default usage of mmap
> when left at auto. Should I look at forcing this setting?
>
> > You don't mention which version of the deb package you're using, but :
>
> I'm using 0.6.5 - the ones bundled by Eric Evans <eevans@apache.org>
>
> Because these are 32GB machines, I've not configured them with
> any swap at all. I've rarely done this in the past - but was aware
> there was this swap-hell scenario with JVM's, and the rationale
> makes sense -- it's better to have the JVM crash ''cleanly" than
> to have it grind the machine to a halt and make it impossible
> to get onto the machine to kill the process.
>
> cheers,
> Jedd.
>
|