From user-return-63769-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Wed May 1 10:54:13 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8D9AF180629 for ; Wed, 1 May 2019 12:54:12 +0200 (CEST) Received: (qmail 83734 invoked by uid 500); 1 May 2019 10:54:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83710 invoked by uid 99); 1 May 2019 10:54:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2019 10:54:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A2ED2C2373 for ; Wed, 1 May 2019 10:54:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.798 X-Spam-Level: ** X-Spam-Status: No, score=2.798 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 2IGWTW8VjHrz for ; Wed, 1 May 2019 10:54:06 +0000 (UTC) Received: from mail-it1-f194.google.com (mail-it1-f194.google.com [209.85.166.194]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C7D176118B for ; Wed, 1 May 2019 10:48:02 +0000 (UTC) Received: by mail-it1-f194.google.com with SMTP id q14so9225193itk.0 for ; Wed, 01 May 2019 03:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=DfQAMnUev43pP7K70f6hjOWr2n1WSZi7dk5HbhOLT/U=; b=dzf8XGyyUsalhgbAFkWIEcLIBFnqP/qmmVFBNJz+FL0vzBsKxaE/z02EIdZqSR/Sr2 QRznwy1ZOWWf24IRADymJw8eYHjuk69wLz2txpHIYsHg/J2xr7N/Icc5rMeSuOw+YmsM xvYhKKjQd9sVLDael5NK4yR4q4eFzISxdSPI92l18Yx/HIGxeRZ4dOdNEHFNiuB3fE4h ZYwK51OaAGUqs5WUBNSJkm72stHPrKUH1WxTCE1nvgCVpZGuKfLOYHt0inPPPKeDkmQP xK+k4pSdoLJtEaI1YVmkxEcUV4VWBT5mlkFkUnFY5CWQjk4oIIEN5lTGnZZsUHyQerls qqBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=DfQAMnUev43pP7K70f6hjOWr2n1WSZi7dk5HbhOLT/U=; b=YAgfMSWVGCd0WQpnJJfWyy53coWppdXH1cBjdMd2u8Wcy0zBG0AtyqmGJCoyN+v5fH ILk0HPcYCgQfkd0gnjkSAX9TMl6jiz/wQkrZ7y8XfwRW8VfstM1y8UW1ilprijKPepkI LeeFc5omko6ZgDjQKSokwogHKwMfiFLoDRVK/Osf4hs8r8Mh8mcqg/JuZZQGFty2eZ3Y nzKlbUx05vowJxZXGFKc9uLEeJ2DWGv8r5j02L187atRPQxGLR/Kl3AjI2gQ2Ud+cLgl OZRu9vMuGeFzMgOHc4C0syrZAUdRFrTANWT3MKBBxyOBtkRySiUdb7mclqOC68Myy4zG H0ng== X-Gm-Message-State: APjAAAVfz3ny+IYsUAheNV9Z5wVFXO5v5AexoAeub7IQ2vu+Z7k8nFio xAYhfccke6oJBSLtaW+9OA0OkhQRY2GhFXJcABXHvQ== X-Google-Smtp-Source: APXvYqwSO8zy5IXchlhuLHDQbVVz0TBeZLt+Tc2MaHp08xhRv8fj7XJkSVQHuqAxXXlQstPlBceIuuMetbIMUlIZByc= X-Received: by 2002:a24:5ec2:: with SMTP id h185mr8471497itb.19.1556707681199; Wed, 01 May 2019 03:48:01 -0700 (PDT) MIME-Version: 1.0 References: <14BA4880-CCDA-4672-A071-7DFF89C1817F@gmail.com> In-Reply-To: From: Sandeep Nethi Date: Wed, 1 May 2019 22:47:50 +1200 Message-ID: Subject: Re: cassandra node was put down with oom error To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="0000000000005b6a130587d14020" --0000000000005b6a130587d14020 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Are you by any chance running the full repair on these nodes? Thanks, Sandeep On Wed, 1 May 2019 at 10:46 PM, Mia wrote: > Hello, Ayub. > > I'm using apache cassandra, not dse edition. So I have never used the dse > search feature. > In my case, all the nodes of the cluster have the same problem. > > Thanks. > > On 2019/05/01 06:13:06, Ayub M wrote: > > Do you have search on the same nodes or is it only cassandra. In my cas= e > it > > was due to a memory leak bug in dse search that consumed more memory > > resulting in oom. > > > > On Tue, Apr 30, 2019, 2:58 AM yeomii999@gmail.com > > wrote: > > > > > Hello, > > > > > > I'm suffering from similar problem with OSS cassandra version3.11.3. > > > My cassandra cluster have been running for longer than 1 years and > there > > > was no problem until this year. > > > The cluster is write-intensive, consists of 70 nodes, and all rows > have 2 > > > hr TTL. > > > The only change is the read consistency from QUORUM to ONE. (I cannot > > > revert this change because of the read latency) > > > Below is my compaction strategy. > > > ``` > > > compaction =3D {'class': > > > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > > > 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES', > > > 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4', > > > 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2', > > > 'unchecked_tombstone_compaction': 'false'} > > > ``` > > > I've tried rolling restarting the cluster several times, > > > but the memory usage of cassandra process always keeps going high. > > > I also tried Native Memory Tracking, but it only measured less memory > > > usage than the system mesaures (RSS in /proc/{cassandra-pid}/status) > > > > > > Is there any way that I could figure out the cause of this problem? > > > > > > > > > On 2019/01/26 20:53:26, Jeff Jirsa wrote: > > > > You=E2=80=99re running DSE so the OSS list may not be much help. Da= tastax May > > > have more insight > > > > > > > > In open source, the only things offheap that vary significantly are > > > bloom filters and compression offsets - both scale with disk space, a= nd > > > both increase during compaction. Large STCS compaction can cause pret= ty > > > meaningful allocations for these. Also, if you have an unusually low > > > compression chunk size or a very low bloom filter FP ratio, those wil= l > be > > > larger. > > > > > > > > > > > > -- > > > > Jeff Jirsa > > > > > > > > > > > > > On Jan 26, 2019, at 12:11 PM, Ayub M wrote: > > > > > > > > > > Cassandra node went down due to OOM, and checking the > /var/log/message > > > I see below. > > > > > > > > > > ``` > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-kille= r: > > > gfp_mask=3D0x280da, order=3D0, oom_score_adj=3D0 > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=3D/ > mems_allowed=3D0 > > > > > .... > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U) > 0*8kB > > > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB > (U) > > > 1*2048kB (M) 3*4096kB (M) =3D 15908kB > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 1294*4kB > (UM) > > > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB > (UEM) > > > 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE > > > > > M) 2*2048kB (EM) 35*4096kB (UM) =3D 242632kB > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4k= B > > > (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512= kB > > > 0*1024kB 0*2048kB 0*4096kB =3D 62500kB > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total= =3D0 > > > hugepages_free=3D0 hugepages_surp=3D0 hugepages_size=3D1048576kB > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total= =3D0 > > > hugepages_free=3D0 hugepages_surp=3D0 hugepages_size=3D2048kB > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache > pages > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap cache > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache stats: add = 0, > > > delete 0, find 0/0 > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap =3D 0kB > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap =3D 0kB > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pages RAM > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages > HighMem/MovableOnly > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages reserved > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ pid ] uid tgid > > > total_vm rss nr_ptes swapents oom_score_adj name > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2634] 0 2634 > > > 41614 326 82 0 0 systemd-journal > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2690] 0 2690 > > > 29793 541 27 0 0 lvmetad > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2710] 0 2710 > > > 11892 762 25 0 -1000 systemd-udevd > > > > > ..... > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [13774] 0 13774 > > > 459778 97729 429 0 0 Scan Factory > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14506] 0 14506 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14586] 0 14586 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14588] 0 14588 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14589] 0 14589 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14598] 0 14598 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14599] 0 14599 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14600] 0 14600 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14601] 0 14601 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [19679] 0 19679 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [19680] 0 19680 > > > 21628 5340 24 0 0 macompatsvc > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 9084] 1007 9084 > > > 2822449 260291 810 0 0 java > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 8509] 1007 8509 > > > 17223585 14908485 32510 0 0 java > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [21877] 0 21877 > > > 461828 97716 318 0 0 ScanAction Mgr > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [21884] 0 21884 > > > 496653 98605 340 0 0 OAS Manager > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [31718] 89 31718 > > > 25474 486 48 0 0 pickup > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 4891] 1007 4891 > > > 26999 191 9 0 0 iostat > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 4957] 1007 4957 > > > 26999 192 10 0 0 iostat > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Out of memory: Kill > process > > > 8509 (java) score 928 or sacrifice child > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Killed process 8509 > (java) > > > total-vm:68894340kB, anon-rss:59496344kB, file-rss:137596kB, > shmem-rss:0kB > > > > > ``` > > > > > > > > > > Nothing else runs on this host except dse cassandra with search a= nd > > > monitoring agents. Max heap size is set to 31g, the cassandra java > process > > > seems to be using ~57gb (ram is 62gb) at the time of error. > > > > > So I am guess the jvm started using lots of memory and triggered > oom > > > error. > > > > > Is my understanding correct? > > > > > That this is linux triggered jvm kill as the jvm was consuming mo= re > > > than available memory? > > > > > > > > > > So in this case jvm was using max of 31g and remaining 26gb its > using > > > is non-heap memory. Normally this process takes around 42g and the fa= ct > > > that at the time of oom moment it was consuming 57g I am suspecting t= he > > > java process to be the culprit rather than victim. > > > > > > > > > > At the time of issue there was no heap dump taken, I have > configured > > > it now. But even if heap dump was taken would it have help figure out > who > > > is consuming more memory. Heapdump would only dump heap memory area, > what > > > should be used to dump non-heapdump? Native memory tracking is one > thing I > > > came across. > > > > > Any way to have native memory dumped when oom occurs? > > > > > Whats the best way to monitor the jvm memory to diagnose oom > errors? > > > > > > > > > > > > > -------------------------------------------------------------------= -- > > > > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org > > > > For additional commands, e-mail: user-help@cassandra.apache.org > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org > > > For additional commands, e-mail: user-help@cassandra.apache.org > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org > For additional commands, e-mail: user-help@cassandra.apache.org > > --0000000000005b6a130587d14020 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Are you by any chance running the full repair on the= se nodes?

Thanks,<= /div>
Sandeep

On Wed, 1 May 2019 at 10:46 PM, Mia <= yeomii999@gmail.com> wrote:
Hello, Ayub.

I'm using apache cassandra, not dse edition. So I have never used the d= se search feature.
In my case, all the nodes of the cluster have the same problem.

Thanks.

On 2019/05/01 06:13:06, Ayub M <hiayub@gmail.com> wrote:
> Do you have search on the same nodes or is it only cassandra. In my ca= se it
> was due to a memory leak bug in dse search that consumed more memory > resulting in oom.
>
> On Tue, Apr 30, 2019, 2:58 AM yeomii999@gmail.com <yeomii999@gmail.com>
> wrote:
>
> > Hello,
> >
> > I'm suffering from similar problem with OSS cassandra version= 3.11.3.
> > My cassandra cluster have been running for longer than 1 years an= d there
> > was no problem until this year.
> > The cluster is write-intensive, consists of 70 nodes, and all row= s have 2
> > hr TTL.
> > The only change is the read consistency from QUORUM to ONE. (I ca= nnot
> > revert this change because of the read latency)
> > Below is my compaction strategy.
> > ```
> > compaction =3D {'class':
> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrat= egy',
> > 'compaction_window_size': '3', 'compaction_wi= ndow_unit': 'MINUTES',
> > 'enabled': 'true', 'max_threshold': '= 32', 'min_threshold': '4',
> > 'tombstone_compaction_interval': '60', 'tombs= tone_threshold': '0.2',
> > 'unchecked_tombstone_compaction': 'false'}
> > ```
> > I've tried rolling restarting the cluster several times,
> > but the memory usage of cassandra process always keeps going high= .
> > I also tried Native Memory Tracking, but it only measured less me= mory
> > usage than the system mesaures (RSS in /proc/{cassandra-pid}/stat= us)
> >
> > Is there any way that I could figure out the cause of this proble= m?
> >
> >
> > On 2019/01/26 20:53:26, Jeff Jirsa <jjirsa@gmail.com> wrote:
> > > You=E2=80=99re running DSE so the OSS list may not be much h= elp. Datastax May
> > have more insight
> > >
> > > In open source, the only things offheap that vary significan= tly are
> > bloom filters and compression offsets - both scale with disk spac= e, and
> > both increase during compaction. Large STCS compaction can cause = pretty
> > meaningful allocations for these. Also, if you have an unusually = low
> > compression chunk size or a very low bloom filter FP ratio, those= will be
> > larger.
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Jan 26, 2019, at 12:11 PM, Ayub M <hiayub@gmail.com> wrote:
> > > >
> > > > Cassandra node went down due to OOM, and checking the /= var/log/message
> > I see below.
> > > >
> > > > ```
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked= oom-killer:
> > gfp_mask=3D0x280da, order=3D0, oom_score_adj=3D0
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset= =3D/ mems_allowed=3D0
> > > > ....
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: = 1*4kB (U) 0*8kB
> > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*10= 24kB (U)
> > 1*2048kB (M) 3*4096kB (M) =3D 15908kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32= : 1294*4kB (UM)
> > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*12= 8kB (UEM)
> > 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > > M) 2*2048kB (EM) 35*4096kB (UM) =3D 242632kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Norma= l: 5319*4kB
> > (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0= *512kB
> > 0*1024kB 0*2048kB 0*4096kB =3D 62500kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugep= ages_total=3D0
> > hugepages_free=3D0 hugepages_surp=3D0 hugepages_size=3D1048576kB<= br> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugep= ages_total=3D0
> > hugepages_free=3D0 hugepages_surp=3D0 hugepages_size=3D2048kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total = pagecache pages
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in s= wap cache
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache s= tats: add 0,
> > delete 0, find 0/0
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap=C2= =A0 =3D 0kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap = =3D 0kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pag= es RAM
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages High= Mem/MovableOnly
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages= reserved
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ pid ]=C2= =A0 =C2=A0uid=C2=A0 tgid
> > total_vm=C2=A0 =C2=A0 =C2=A0 rss nr_ptes swapents oom_score_adj n= ame
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2634]=C2= =A0 =C2=A0 =C2=A00=C2=A0 2634
> > 41614=C2=A0 =C2=A0 =C2=A0 326=C2=A0 =C2=A0 =C2=A0 82=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 systemd-j= ournal
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2690]=C2= =A0 =C2=A0 =C2=A00=C2=A0 2690
> > 29793=C2=A0 =C2=A0 =C2=A0 541=C2=A0 =C2=A0 =C2=A0 27=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 lvmetad > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2710]=C2= =A0 =C2=A0 =C2=A00=C2=A0 2710
> > 11892=C2=A0 =C2=A0 =C2=A0 762=C2=A0 =C2=A0 =C2=A0 25=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-1000 systemd-udevd
> > > > .....
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [13774]=C2= =A0 =C2=A0 =C2=A00 13774
> >=C2=A0 459778=C2=A0 =C2=A0 97729=C2=A0 =C2=A0 =C2=A0429=C2=A0 =C2= =A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 Scan F= actory
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14506]=C2= =A0 =C2=A0 =C2=A00 14506
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14586]=C2= =A0 =C2=A0 =C2=A00 14586
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14588]=C2= =A0 =C2=A0 =C2=A00 14588
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14589]=C2= =A0 =C2=A0 =C2=A00 14589
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14598]=C2= =A0 =C2=A0 =C2=A00 14598
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14599]=C2= =A0 =C2=A0 =C2=A00 14599
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14600]=C2= =A0 =C2=A0 =C2=A00 14600
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14601]=C2= =A0 =C2=A0 =C2=A00 14601
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [19679]=C2= =A0 =C2=A0 =C2=A00 19679
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [19680]=C2= =A0 =C2=A0 =C2=A00 19680
> > 21628=C2=A0 =C2=A0 =C2=A05340=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 macompats= vc
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 9084]=C2= =A0 1007=C2=A0 9084
> > 2822449=C2=A0 =C2=A0260291=C2=A0 =C2=A0 =C2=A0810=C2=A0 =C2=A0 = =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 java
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 8509]=C2= =A0 1007=C2=A0 8509
> > 17223585 14908485=C2=A0 =C2=A032510=C2=A0 =C2=A0 =C2=A0 =C2=A0 0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 java
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [21877]=C2= =A0 =C2=A0 =C2=A00 21877
> >=C2=A0 461828=C2=A0 =C2=A0 97716=C2=A0 =C2=A0 =C2=A0318=C2=A0 =C2= =A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 ScanAc= tion Mgr
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [21884]=C2= =A0 =C2=A0 =C2=A00 21884
> >=C2=A0 496653=C2=A0 =C2=A0 98605=C2=A0 =C2=A0 =C2=A0340=C2=A0 =C2= =A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 OAS Ma= nager
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [31718]=C2= =A0 =C2=A0 89 31718
> > 25474=C2=A0 =C2=A0 =C2=A0 486=C2=A0 =C2=A0 =C2=A0 48=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 pickup > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 4891]=C2= =A0 1007=C2=A0 4891
> > 26999=C2=A0 =C2=A0 =C2=A0 191=C2=A0 =C2=A0 =C2=A0 =C2=A09=C2=A0 = =C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 ios= tat
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 4957]=C2= =A0 1007=C2=A0 4957
> > 26999=C2=A0 =C2=A0 =C2=A0 192=C2=A0 =C2=A0 =C2=A0 10=C2=A0 =C2=A0= =C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 iostat > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Out of memor= y: Kill process
> > 8509 (java) score 928 or sacrifice child
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Killed proce= ss 8509 (java)
> > total-vm:68894340kB, anon-rss:59496344kB, file-rss:137596kB, shme= m-rss:0kB
> > > > ```
> > > >
> > > > Nothing else runs on this host except dse cassandra wit= h search and
> > monitoring agents. Max heap size is set to 31g, the cassandra jav= a process
> > seems to be using ~57gb (ram is 62gb) at the time of error.
> > > > So I am guess the jvm started using lots of memory and = triggered oom
> > error.
> > > > Is my understanding correct?
> > > > That this is linux triggered jvm kill as the jvm was co= nsuming more
> > than available memory?
> > > >
> > > > So in this case jvm was using max of 31g and remaining = 26gb its using
> > is non-heap memory. Normally this process takes around 42g and th= e fact
> > that at the time of oom moment it was consuming 57g I am suspecti= ng the
> > java process to be the culprit rather than victim.
> > > >
> > > > At the time of issue there was no heap dump taken, I ha= ve configured
> > it now. But even if heap dump was taken would it have help figure= out who
> > is consuming more memory. Heapdump would only dump heap memory ar= ea, what
> > should be used to dump non-heapdump? Native memory tracking is on= e thing I
> > came across.
> > > > Any way to have native memory dumped when oom occurs? > > > > Whats the best way to monitor the jvm memory to diagnos= e oom errors?
> > > >
> > >
> > > ------------------------------------------------------------= ---------
> > > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.or= g
> > > For additional commands, e-mail: user-help@cassandra.apache.org=
> > >
> > >
> >
> > -----------------------------------------------------------------= ----
> > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org=
> > For additional commands, e-mail: user-help@cassandra.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

--0000000000005b6a130587d14020--