cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliy I Ozerov <voze...@news360.ru>
Subject Huge ReadStage Pending tasks during startup
Date Fri, 27 Nov 2015 10:52:34 GMT
Hello!

We have some strange troubles with cassandra startup. Cluster consists of 4 nodes. 32 Gb RAM
per node, each node has about 30Gb of data, 8 CPU.

root@vega010:~# nodetool version ReleaseVersion: 2.2.1

So, before stop (using disablethrift, drain):

nodetool tpstats: Read Stage 0 0 3093579 0 0

Just after start in logs:

INFO [main] http://airmail.calendar/2015-11-25%2013:22:04%20GMT+3 YamlConfigurationLoader.java:92
- Loading settings from file:/etc/cassandra/cassandra.yaml 
. . . skipped . . .
INFO [main] http://airmail.calendar/2015-11-25%2013:22:21%20GMT+3 CommitLog.java:168 - Replaying
/var/lib/cassandra/commitlog/CommitLog–5–1448388020045.log, /var/lib/cassandra/commitlog/CommitLog–5–1448388020046.log,
/var/lib/cassand
. . .skipped. . .
INFO [main] http://airmail.calendar/2015-11-25%2013:23:44%20GMT+3 CommitLog.java:170 - Log
replay complete, 1047857 replayed mutations
. . . skipped .. .
INFO [CompactionExecutor:4] http://airmail.calendar/2015-11-25%2013:23:45%20GMT+3 CompactionTask.java:142
- Compacting (cf08d1d0–93ba–11e5-b9f0–7be7ca1986fb) [/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3479-big-Data.db:level=0,
/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3474-big-Data.db:level=0,
/var/lib/cassandra/data/system/compaction_history-b4db
. . . skipped. . .
INFO [HANDSHAKE-/10.50.2.60] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494
- Handshaking version with /10.50.2.60
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003
- Node /10.50.2.66 has restarted, now UP
WARN [GossipTasks:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 FailureDetector.java:243
- Not marking nodes down due to local pause of 101075806441 > 5000000000
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869
- Node /10.50.2.66 state jump to normal
INFO [HANDSHAKE-/10.50.2.60] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494
- Handshaking version with /10.50.2.60
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003
- Node /10.50.2.60 has restarted, now UP
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869
- Node /10.50.2.60 state jump to normal
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003
- Node /10.50.2.57 has restarted, now UP
INFO [HANDSHAKE-/10.50.2.66] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494
- Handshaking version with /10.50.2.66
INFO [HANDSHAKE-/10.50.2.57] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494
- Handshaking version with /10.50.2.57
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869
- Node /10.50.2.57 state jump to normal
INFO [SharedPool-Worker–20] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.60 is now UP
INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 ColumnFamilyStore.java:743
- Completed loading (557 ms; 7022 shards) counter cache for SourcesAggregatedEventsV2.StoryReadingTimeSumPerDay_UTC_P_7
INFO [HANDSHAKE-/10.50.2.66] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494
- Handshaking version with /10.50.2.66
INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 AutoSavingCache.java:146
- reading saved cache /var/lib/cassandra/saved_caches/SourcesAggregatedEventsV2-StoryReadingTimeSumPerDay_UTC_N_2-f318e310735f11e5b9599b83dc51d0b0-CounterCache-c.db
INFO [SharedPool-Worker–13] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–3] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.57 is now UP
INFO [SharedPool-Worker–16] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.57 is now UP INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869
- Node /10.50.2.60 state jump to normal
INFO [SharedPool-Worker–4] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–20] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.57 is now UP
INFO [SharedPool-Worker–1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–5] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.66 is now UP
INFO [SharedPool-Worker–2] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970
- InetAddress /10.50.2.66 is now UP INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869
- Node /10.50.2.57 state jump to normal
INFO [ScheduledTasks:1] http://airmail.calendar/2015-11-25%2013:25:55%20GMT+3 StatusLogger.java:51
- Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] http://airmail.calendar/2015-11-25%2013:25:55%20GMT+3 StatusLogger.java:55
- ReadStage 32 2753202 69509 0 0
INFO [ScheduledTasks:1] http://airmail.calendar/2015-11-25%2013:25:55%20GMT+3 StatusLogger.java:55
- MutationStage 32 602 9197964 0 0

So, just after start it has 2753202 pending readstage tasks. And it takes about 11 hours to
complete them all.

So, what could be the reason? 

-- 
Vasiliy I Ozerov
Sent with Airmail
Mime
View raw message