cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navjyot Nishant (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12655) Incremental repair & compaction hang on random nodes
Date Fri, 16 Sep 2016 13:49:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496384#comment-15496384
] 

Navjyot Nishant commented on CASSANDRA-12655:
---------------------------------------------

Thanks Marcus. Well all these issues has a point of identification, either the error is being
logged in system.log or the user is getting some sort of error so that we can relate the issue.
In our case we don't see any ERROR, WARN, timeout, failure etc in system.log. hence the issue
is clueless. We wanted to understand what is causing this? where is the loophole?

> Incremental repair & compaction hang on random nodes
> ----------------------------------------------------
>
>                 Key: CASSANDRA-12655
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12655
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>         Environment: CentOS Linux release 7.1.1503 (Core)
> RAM - 64GB
> HEAP - 16GB
> Load on each node - ~5GB
> Cassandra Version - 2.2.5
>            Reporter: Navjyot Nishant
>            Priority: Blocker
>
> Hi We are setting up incremental repair on our 18 node cluster. Avg load on each node
is ~5GB. The repair run fine on couple of nodes and sudently get stuck on random nodes. Upon
checking the system.log of impacted node we dont see much information.
> Following are the lines we see in system.log and its there from the point repair is not
making progress -
> {code}
> INFO  [CompactionExecutor:3490] 2016-09-16 11:14:44,236 CompactionManager.java:1221 -
Anticompacting [BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30832-big-Data.db'),
BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30811-big-Data.db')]
> INFO  [IndexSummaryManager:1] 2016-09-16 11:14:49,954 IndexSummaryRedistribution.java:74
- Redistributing index summaries
> INFO  [IndexSummaryManager:1] 2016-09-16 12:14:49,961 IndexSummaryRedistribution.java:74
- Redistributing index summaries
> {code}
> When we try to see pending compaction by executing {code}nodetool compactionstats{code}
it hangs as well and doesn't return anything. However {code}nodetool tpstats{code} show active
and pending compaction which never come down and keep increasing. 
> {code}
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> MutationStage                     0         0         221208         0              
  0
> ReadStage                         0         0        1288839         0              
  0
> RequestResponseStage              0         0         104356         0              
  0
> ReadRepairStage                   0         0             72         0              
  0
> CounterMutationStage              0         0              0         0              
  0
> HintedHandoff                     0         0             46         0              
  0
> MiscStage                         0         0              0         0              
  0
> CompactionExecutor                8        66          68124         0              
  0
> MemtableReclaimMemory             0         0            166         0              
  0
> PendingRangeCalculator            0         0             38         0              
  0
> GossipStage                       0         0         242455         0              
  0
> MigrationStage                    0         0              0         0              
  0
> MemtablePostFlush                 0         0           3682         0              
  0
> ValidationExecutor                0         0           2246         0              
  0
> Sampler                           0         0              0         0              
  0
> MemtableFlushWriter               0         0            166         0              
  0
> InternalResponseStage             0         0           8866         0              
  0
> AntiEntropyStage                  0         0          15417         0              
  0
> Repair#7                          0         0            160         0              
  0
> CacheCleanupExecutor              0         0              0         0              
  0
> Native-Transport-Requests         0         0         327334         0              
  0
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> MUTATION                     0
> COUNTER_MUTATION             0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> {code}
> {code} nodetool netstats{code} shows some pending messages which never get processed
and noting in progress -
> {code}
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 15585
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed
> Large messages                  n/a        12            562
> Small messages                  n/a         0         999779
> Gossip messages                 n/a         0         264394
> {code}
> The only solution we have is bounce the node and all the pending compactions started
getting processed immediately and get processed in 5 - 10 minutes.
> This is a road blocker issue for us and and help in this matter would be highly appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message