cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Laxmikant Upadhyay <laxmikant....@gmail.com>
Subject Re: Cassandra 3.11.3 map errors
Date Wed, 23 Jan 2019 05:19:36 GMT
I have observed a memory leak in 3.11.2  version similar situation which
Bobbie has mentioned, We faced corrupt sstable issue after disk space was
100% full (due to snapshots) in multiple nodes (we have default
disk_failure_policy)
After clearing the snapshots we ran repair and observed following issues
when running 'nodetool repair -full -pr'

*1. Failed  giving multiple warnings followed
by  ArrayIndexOutOfBoundsException   *
WARN  [CompactionExecutor:302] 2019-01-21 06:04:01,502
LeveledCompactionStrategy.java:144 - Could not acquire references for
compacting SSTables ......which is not a problem per se,unless it happens
frequently, in which case it must be reported. Will retry later.

ERROR [ValidationExecutor:49] 2019-01-21 05:46:39,810 Validator.java:268 -
Failed creating a merkle tree for [repair
#441ad710-1d11-11e9-acec-43289f88a823 on ks1/table1, [
ERROR [ValidationExecutor:49] 2019-01-21 05:46:39,811
CassandraDaemon.java:228 - Exception in thread
Thread[ValidationExecutor:49,1,main]
java.lang.ArrayIndexOutOfBoundsException: null

2. * After this failure, we restarted the node and compaction got triggered
after startup and failed giving CorruptSSTableException  error: *

Caused by: org.apache.cassandra.io.compress.CorruptBlockException:
(/var/lib/cassandra/data/ks1/table2-f7a6e260155a11e996298b24fac245dc/mc-2154-big-Data.db):
corruption detected, chunk at 51859444 of length 10387.
ERROR [CompactionExecutor:3] 2019-01-21 07:27:03,282
CassandraDaemon.java:244 - Exception in thread
Thread[CompactionExecutor:3,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
/var/lib/cassandra/data/ks1/table2-f7a6e260155a11e9962
98b24fac245dc/mc-2154-big-Data.db

Not sure why it was not giving CorruptSSTableException at first place.

*3. One node went down without writing any logs in system.log ..further
checking the messages log of linux it was found that the cassandra process
went due to OOM.* After starting the node, system.log or compactionHistory
also *not showing record of any activity at the time when the node went
down*.

@Bobbie have you raised any jira for this issue ?

On Wed, Jan 16, 2019 at 1:15 PM Roy Burstein <burstein.roy@gmail.com> wrote:

> Hi Bob ,
> We did not use the reaper(yet) just trying to run tests against the new C*
> v3 cluster .
> The nodes keeps crashing all the time and these are the error we are
> getting .
> Any other ideas ?
> Thanks!
> Roy
>
> On Wed, Jan 16, 2019 at 8:51 AM Bobbie Haynes <haynes30349@gmail.com>
> wrote:
>
>> Hi Roy,
>>             I don't think the Memory Leak issue is related to MAP errors
>> .I was also using Reaper in our cluster.I have seen Memory Leak issue (ERROR
>> [Reference-Reaper:1] 2019-01-14 00:03:46,469 Ref.java:224 - LEAK DETECTED)
>> when some of SStables got corrupted because of disk space issue we had when
>> compactions were running.
>> I guess you have to report this Memory Leak issue to Reaper tool JIRA.
>>
>> Thanks,
>> Bob
>>
>> On Mon, Jan 14, 2019 at 8:44 AM Roy Burstein <burstein.roy@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> We are testing C* 3.11.3 and we have mapping issue and possibly leaked
>>> memory.
>>> It might be related to our configuration,any ideas would be helpful .
>>>
>>>
>>>
>>> Cassandra version: 3.11.3
>>> OS: CentOS Linux release 7.4.1708 (Core)
>>> Kernel: 3.10.0-957.1.3.el7.x86_64
>>> JDK: jdk1.8.0_131
>>> Heap: same errors with 16GB / 32GB / 64GB.
>>>
>>> *We are seeing this errors in production:*
>>>
>>> *java.io.IOException: Map failed:*
>>>
>>> ERROR [CompactionExecutor:5017] 2019-01-14 00:02:04,763 CassandraDaemon.java:228
- Exception in thread Thread[CompactionExecutor:5017,1,main]
>>> org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed
>>>         at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:181)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:61)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:104)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:362)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:290)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:179)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:134)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:65)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:142)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:201)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:85)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:274)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_131]
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_131]
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]
>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
[apache-cassandra-3.11.3.jar:3.11.3]
>>>         at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
>>> Caused by: java.io.IOException: Map failed
>>>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) ~[na:1.8.0_131]
>>>         at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153)
~[apache-cassandra-3.11.3.jar:3.11.3]
>>>         ... 23 common frames omitted
>>> Caused by: java.lang.OutOfMemoryError: Map failed
>>>         at sun.nio.ch.FileChannelImpl.map0(Native Method) ~[na:1.8.0_131]
>>>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) ~[na:1.8.0_131]
>>>         ... 24 common frames omitted
>>>
>>> *LEAK DETECTED:*
>>>
>>> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,469 Ref.java:224 - LEAK DETECTED:
a reference (org.apache.cassandra.utils.concurrent.Ref$State@6a4ef142) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1651696741:Memory@[6b91a27c5290..6b91a27de290)
was not released before the reference was garbage collected
>>> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,520 Ref.java:224 - LEAK DETECTED:
a reference (org.apache.cassandra.utils.concurrent.Ref$State@6c458f8a) to class org.apache.cassandra.io.util.FileHandle$Cleanup@1179238225:/var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_01_13-19be8e90037011e9a45847402874bbd7/mc-1209-big-Index.db
was not released before the reference was garbage collected
>>> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,520 Ref.java:224 - LEAK DETECTED:
a reference (org.apache.cassandra.utils.concurrent.Ref$State@5b90823b) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@783549664:/var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_01_13-19be8e90037011e9a45847402874bbd7/mc-1209-big-Data.db
was not released before the reference was garbage collected
>>> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,520 Ref.java:224 - LEAK DETECTED:
a reference (org.apache.cassandra.utils.concurrent.Ref$State@6ecdf763) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@1710583516:[Memory@[0..3e24),
Memory@[0..45e88)] was not released before the reference was garbage collected
>>>
>>>
>>> *Limits of Cassandra process:*
>>>
>>>  [root@cass063 ~ ]# cat /proc/`ps -ef | grep CassandraDaemon | grep -v grep |
awk '\{print $2}'`/limits
>>>  Limit                     Soft Limit           Hard Limit           Units
>>>  Max cpu time              unlimited            unlimited            seconds
>>>  Max file size             unlimited            unlimited            bytes
>>>  Max data size             unlimited            unlimited            bytes
>>>  Max stack size            8388608              unlimited            bytes
>>>  Max core file size        0                    unlimited            bytes
>>>  Max resident set          unlimited            unlimited            bytes
>>>  Max processes             32768                32768                processes
>>>  Max open files            100000               100000               files
>>>  Max locked memory         unlimited            unlimited            bytes
>>>  Max address space         unlimited            unlimited            bytes
>>>  Max file locks            unlimited            unlimited            locks
>>>  Max pending signals       766985               766985               signals
>>>  Max msgqueue size         819200               819200               bytes
>>>  Max nice priority         0                    0
>>>  Max realtime priority     0                    0
>>>  Max realtime timeout      unlimited            unlimited            us
>>>
>>>
>>>
>>> *max_map_count parameter on OS:*
>>>
>>>  [root@cass063 ~]# sysctl vm.max_map_count
>>>  vm.max_map_count = 1073741824
>>>
>>>
>>>
>>>
>>> *cassandra.yaml:*
>>>
>>>  cluster_name: 'Cass Cluster'
>>>  num_tokens: 256
>>>  hinted_handoff_enabled: false
>>>  max_hint_window_in_ms: 10800000
>>>  hinted_handoff_throttle_in_kb: 1024
>>>  max_hints_delivery_threads: 2
>>>  hints_directory: /var/lib/cassandra/hints
>>>  hints_flush_period_in_ms: 10000
>>>  max_hints_file_size_in_mb: 128
>>>  batchlog_replay_throttle_in_kb: 1024
>>>  authenticator: AllowAllAuthenticator
>>>  authorizer: AllowAllAuthorizer
>>>  role_manager: CassandraRoleManager
>>>  roles_validity_in_ms: 2000
>>>  permissions_validity_in_ms: 2000
>>>  credentials_validity_in_ms: 2000
>>>  partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>>  data_file_directories:
>>>      - /var/lib/cassandra/data/disk1
>>>  commitlog_directory: /var/lib/cassandra/data/disk1/commitlog
>>>  cdc_enabled: false
>>>  disk_failure_policy: stop
>>>  commit_failure_policy: stop
>>>  prepared_statements_cache_size_mb:
>>>  thrift_prepared_statements_cache_size_mb:
>>>  key_cache_size_in_mb: 0
>>>  key_cache_save_period: 3600
>>>  row_cache_size_in_mb: 0
>>>  row_cache_save_period: 0
>>>  counter_cache_size_in_mb:
>>>  counter_cache_save_period: 7200
>>>  saved_caches_directory: /var/lib/cassandra/data/disk1/saved_caches
>>>  commitlog_sync: periodic
>>>  commitlog_sync_period_in_ms: 10000
>>>  commitlog_segment_size_in_mb: 32
>>>  seed_provider:
>>>      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>>>        parameters:
>>>            - seeds: "10.110.30.1,10.110.30.2,10.110.30.3"
>>>  concurrent_reads: 48
>>>  concurrent_writes: 96
>>>  concurrent_counter_writes: 32
>>>  concurrent_materialized_view_writes: 32
>>>  file_cache_size_in_mb: 10240
>>>  memtable_offheap_space_in_mb: 10240
>>>  memtable_cleanup_threshold: 0.1
>>>  memtable_allocation_type: offheap_buffers
>>>  commitlog_total_space_in_mb: 8192
>>>  memtable_flush_writers: 8
>>>  index_summary_capacity_in_mb:
>>>  index_summary_resize_interval_in_minutes: 60
>>>  trickle_fsync: true
>>>  trickle_fsync_interval_in_kb: 10240
>>>  storage_port: 7000
>>>  ssl_storage_port: 7001
>>>  listen_address: 10.106.62.34
>>>  start_native_transport: true
>>>  native_transport_port: 9042
>>>  start_rpc: false
>>>  rpc_address: 0.0.0.0
>>>  rpc_port: 9160
>>>  broadcast_rpc_address: 10.106.62.34
>>>  rpc_keepalive: true
>>>  rpc_server_type: hsha
>>>  rpc_max_threads: 128
>>>  thrift_framed_transport_size_in_mb: 15
>>>  incremental_backups: false
>>>  snapshot_before_compaction: false
>>>  auto_snapshot: true
>>>  column_index_size_in_kb: 64
>>>  column_index_cache_size_in_kb: 2
>>>  concurrent_compactors: 32
>>>  compaction_throughput_mb_per_sec: 500
>>>  sstable_preemptive_open_interval_in_mb: 50
>>>  stream_throughput_outbound_megabits_per_sec: 0
>>>  read_request_timeout_in_ms: 10000
>>>  range_request_timeout_in_ms: 10000
>>>  write_request_timeout_in_ms: 60000
>>>  counter_write_request_timeout_in_ms: 10000
>>>  cas_contention_timeout_in_ms: 1000
>>>  truncate_request_timeout_in_ms: 60000
>>>  request_timeout_in_ms: 10000
>>>  slow_query_log_timeout_in_ms: 500
>>>  cross_node_timeout: false
>>>  phi_convict_threshold: 12
>>>  endpoint_snitch: GossipingPropertyFileSnitch
>>>  dynamic_snitch_update_interval_in_ms: 100
>>>  dynamic_snitch_reset_interval_in_ms: 600000
>>>  dynamic_snitch_badness_threshold: 0.5
>>>  request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>>>  server_encryption_options:
>>>      internode_encryption: none
>>>      keystore: conf/.keystore
>>>      keystore_password: cassandra
>>>      truststore: conf/.truststore
>>>      truststore_password: cassandra
>>>  client_encryption_options:
>>>      enabled: false
>>>      optional: false
>>>      keystore: conf/.keystore
>>>      keystore_password: cassandra
>>>  internode_compression: dc
>>>  inter_dc_tcp_nodelay: false
>>>  tracetype_query_ttl: 86400
>>>  tracetype_repair_ttl: 604800
>>>  enable_user_defined_functions: false
>>>  enable_scripted_user_defined_functions: false
>>>  enable_materialized_views: true
>>>  windows_timer_interval: 1
>>>  transparent_data_encryption_options:
>>>      enabled: false
>>>      chunk_length_kb: 64
>>>      cipher: AES/CBC/PKCS5Padding
>>>      key_alias: testing:1
>>>      key_provider:
>>>        - class_name: org.apache.cassandra.security.JKSKeyProvider
>>>          parameters:
>>>            - keystore: conf/.keystore
>>>              keystore_password: cassandra
>>>              store_type: JCEKS
>>>              key_password: cassandra
>>>  tombstone_warn_threshold: 1000
>>>  tombstone_failure_threshold: 100000
>>>  batch_size_warn_threshold_in_kb: 5
>>>  batch_size_fail_threshold_in_kb: 50
>>>  unlogged_batch_across_partitions_warn_threshold: 10
>>>  compaction_large_partition_warning_threshold_mb: 10
>>>  gc_warn_threshold_in_ms: 1000
>>>  back_pressure_enabled: false
>>>  back_pressure_strategy:
>>>      - class_name: org.apache.cassandra.net.RateBasedBackPressure
>>>        parameters:
>>>          - high_ratio: 0.90
>>>            factor: 5
>>>            flow: FAST
>>>
>>>
>>>
>>> *A lot of maps, 200K maps of cassandra process,*:
>>>
>>> [root@cass063 ~]# wc -l /proc/`ps -ef | grep CassandraDaemon | grep -v grep |
awk '{print $2}'`/maps
>>> 239587 /proc/202664/maps
>>>
>>>  Thanks,
>>> Roy
>>>
>>

-- 

regards,
Laxmikant Upadhyay

Mime
View raw message