cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Łukasz Mrożkiewicz (JIRA) <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-9798) Cassandra seems to have deadlocks during flush operations
Date Mon, 20 Jul 2015 12:42:05 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Łukasz Mrożkiewicz updated CASSANDRA-9798:
------------------------------------------
    Attachment: topHbn1.txt
                stack.txt

stack and top during deadlock.
MutationStage                   128   1312866        1093469MemtableFlushWriter          
    6        29             37         0                 0
MemtablePostFlush                 1        33             47         0                 0


> Cassandra seems to have deadlocks during flush operations
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9798
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9798
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 4x HP Gen9 dl 360 servers
> 2x8 cpu each (Intel(R) Xeon E5-2667 v3 @ 3.20GHz)
> 6x900GB 10kRPM disk for data
> 1x900GB 10kRPM disk for commitlog
> 64GB ram
> ETH: 10Gb/s
> Red Hat Enterprise Linux Server release 6.6 (Santiago) 2.6.32-504.el6.x86_64
> java build 1.8.0_45-b14 (openjdk) (tested on oracle java 8 too)
>            Reporter: Łukasz Mrożkiewicz
>             Fix For: 2.1.x
>
>         Attachments: cassandra.2.1.8.log, cassandra.log, cassandra.yaml, cassandra.yaml,
gc.log.0.current, stack.txt, topHbn1.txt
>
>
> Hi,
> We noticed some problem with dropped mutationstages. Usually on one random node there
is a situation that:
> MutationStage "active" is full, "pending" is increasing  "completed" is stalled.
> MemtableFlushWriter "active" 6, pending: 25 completed: stalled 
> MemtablePostFlush "active" is 1, pending 29 completed: stalled
> after a some time (30s-10min) pending mutations are dropped and everything is working.
> When it happened:
> 1. Cpu idle is ~95%
> 2. no gc long pauses or more activity.
> 3. memory usage 3.5GB form 8GB
> 4. only writes is processed by cassandra
> 5. when LOAD > 400GB/node problems appeared 
> 6. cassandra 2.1.6
> There is gap in logs:
> {code}
> INFO  08:47:01 Timed out replaying hints to /192.168.100.83; aborting (0 delivered)
> INFO  08:47:01 Enqueuing flush of hints: 7870567 (0%) on-heap, 0 (0%) off-heap
> INFO  08:47:30 Enqueuing flush of table1: 95301807 (4%) on-heap, 0 (0%) off-heap
> INFO  08:47:31 Enqueuing flush of table1: 60462632 (3%) on-heap, 0 (0%) off-heap
> INFO  08:47:31 Enqueuing flush of table2: 76973746 (4%) on-heap, 0 (0%) off-heap
> INFO  08:47:31 Enqueuing flush of table1: 84290135 (4%) on-heap, 0 (0%) off-heap
> INFO  08:47:32 Enqueuing flush of table3: 56926652 (3%) on-heap, 0 (0%) off-heap
> INFO  08:47:32 Enqueuing flush of table1: 85124218 (4%) on-heap, 0 (0%) off-heap
> INFO  08:47:33 Enqueuing flush of table2: 95663415 (4%) on-heap, 0 (0%) off-heap
> INFO  08:47:58 CompactionManager                 2        39
> INFO  08:47:58 Writing Memtable-table2@1767938721(13843064 serialized bytes, 162359 ops,
4%/0% of on/off-heap l
> imit)
> INFO  08:47:58 Writing Memtable-hints@1433125911(478703 serialized bytes, 424 ops, 0%/0%
of on/off-heap limit)
> INFO  08:47:58 Writing Memtable-table2@1318583275(11783615 serialized bytes, 137378 ops,
4%/0% of on/off-heap l
> imit)
> INFO  08:47:58 Enqueuing flush of compactions_in_progress: 969 (0%) on-heap, 0 (0%) off-heap
> INFO  08:47:58 Writing Memtable-table1@541175113(17221327 serialized bytes, 180792 ops,
4%/0% of on/off-heap
>  limit)
> INFO  08:47:58 Writing Memtable-table1@1361154669(27138519 serialized bytes, 273472 ops,
6%/0% of on/off-hea
> p limit)
> INFO  08:48:03 2176 MUTATION messages dropped in last 5000ms
> {code}
> use case:
> 100% write - 100Mb/s, couples of CF ~10column each. max cell size 100B
> CMS and G1GC tested - no difference



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message