Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F21BC1892D for ; Fri, 10 Jul 2015 18:44:47 +0000 (UTC) Received: (qmail 21640 invoked by uid 500); 10 Jul 2015 18:44:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 21596 invoked by uid 500); 10 Jul 2015 18:44:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 21580 invoked by uid 99); 10 Jul 2015 18:44:44 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jul 2015 18:44:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A7254C0052 for ; Fri, 10 Jul 2015 18:44:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.12 X-Spam-Level: **** X-Spam-Status: No, score=4.12 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_HUGEIMGSRC=0.2, MANY_SPAN_IN_TEXT=1, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id SbK492tEMchE for ; Fri, 10 Jul 2015 18:44:28 +0000 (UTC) Received: from mail-yk0-f179.google.com (mail-yk0-f179.google.com [209.85.160.179]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 7D80D20CCD for ; Fri, 10 Jul 2015 18:44:27 +0000 (UTC) Received: by ykeo3 with SMTP id o3so149680014yke.0 for ; Fri, 10 Jul 2015 11:44:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=riQ7iUj7IuhRMOsBtDHpWGsZZ78P1bxTSfxsHENhdRU=; b=s8+kqhowNnAAma6rfJFHf87HW5eI2xBhHvDAJiS+hYvSDN7W+RrJ+cC1CssqxlMvh+ HULJZKP89zFbQi+lJq8XqorWsycTDnQFPfYYZU/d8dk8cB9e3g7OkbGAMn9FmA4ZRmLI kWxhmW4WOmvkYvoy8HO0ZnVV6RGEg+7L6Sqv+ti0qFNMfONy5+7F/8Cs+3o9iUgxFKlk XS7zE4Rz0ekiPyiWAeKruN/O5CI+WuhOLpRyOXb6ZOZnDF2llkSeqvRsozIjxYMBSSC/ sM5GqOEBdX2b1o2RXqVQeZBEwMiEuwaYUUZWDOcGMybpkaQKmPiDrrscLqzsWf9kYtNe 5TsQ== MIME-Version: 1.0 X-Received: by 10.170.225.193 with SMTP id r184mr24910232ykf.34.1436553866549; Fri, 10 Jul 2015 11:44:26 -0700 (PDT) Received: by 10.129.125.130 with HTTP; Fri, 10 Jul 2015 11:44:26 -0700 (PDT) In-Reply-To: References: Date: Sat, 11 Jul 2015 00:14:26 +0530 Message-ID: Subject: Re: Cassandra OOM on joining existing ring From: Kunal Gangakhedkar To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113abcaeeaa098051a89c2d1 --001a113abcaeeaa098051a89c2d1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable And here is my cassandra-env.sh https://gist.github.com/kunalg/2c092cb2450c62be9a20 Kunal On 11 July 2015 at 00:04, Kunal Gangakhedkar wrote: > From jhat output, top 10 entries for "Instance Count for All Classes > (excluding platform)" shows: > > 2088223 instances of class org.apache.cassandra.db.BufferCell > 1983245 instances of class > org.apache.cassandra.db.composites.CompoundSparseCellName > 1885974 instances of class > org.apache.cassandra.db.composites.CompoundDenseCellName > 630000 instances of class > org.apache.cassandra.io.sstable.IndexHelper$IndexInfo > 503687 instances of class org.apache.cassandra.db.BufferDeletedCell > 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier > 101800 instances of class org.apache.cassandra.utils.concurrent.Ref > 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State > 90704 instances of class > org.apache.cassandra.utils.concurrent.Ref$GlobalState > 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey > > At the bottom of the page, it shows: > Total of 8739510 instances occupying 193607512 bytes. > JFYI. > > Kunal > > On 10 July 2015 at 23:49, Kunal Gangakhedkar > wrote: > >> Thanks for quick reply. >> >> 1. I don't know what are the thresholds that I should look for. So, to >> save this back-and-forth, I'm attaching the cfstats output for the keysp= ace. >> >> There is one table - daily_challenges - which shows compacted partition >> max bytes as ~460M and another one - daily_guest_logins - which shows >> compacted partition max bytes as ~36M. >> >> Can that be a problem? >> Here is the CQL schema for the daily_challenges column family: >> >> CREATE TABLE app_10001.daily_challenges ( >> segment_type text, >> date timestamp, >> user_id int, >> sess_id text, >> data text, >> deleted boolean, >> PRIMARY KEY (segment_type, date, user_id, sess_id) >> ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) >> AND bloom_filter_fp_chance =3D 0.01 >> AND caching =3D '{"keys":"ALL", "rows_per_partition":"NONE"}' >> AND comment =3D '' >> AND compaction =3D {'min_threshold': '4', 'class': >> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >> 'max_threshold': '32'} >> AND compression =3D {'sstable_compression': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >> AND dclocal_read_repair_chance =3D 0.1 >> AND default_time_to_live =3D 0 >> AND gc_grace_seconds =3D 864000 >> AND max_index_interval =3D 2048 >> AND memtable_flush_period_in_ms =3D 0 >> AND min_index_interval =3D 128 >> AND read_repair_chance =3D 0.0 >> AND speculative_retry =3D '99.0PERCENTILE'; >> >> CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); >> >> >> 2. I don't know - how do I check? As I mentioned, I just installed the >> dsc21 update from datastax's debian repo (ver 2.1.7). >> >> Really appreciate your help. >> >> Thanks, >> Kunal >> >> On 10 July 2015 at 23:33, Sebastian Estevez < >> sebastian.estevez@datastax.com> wrote: >> >>> 1. You want to look at # of sstables in cfhistograms or in cfstats look >>> at: >>> Compacted partition maximum bytes >>> Maximum live cells per slice >>> >>> 2) No, here's the env.sh from 3.0 which should work with some tweaks: >>> >>> https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9= d72a035c64/conf/cassandra-env.sh >>> >>> You'll at least have to modify the jamm version to what's in yours. I >>> think it's 2.5 >>> >>> >>> >>> All the best, >>> >>> >>> [image: datastax_logo.png] >>> >>> Sebasti=C3=A1n Est=C3=A9vez >>> >>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com >>> >>> [image: linkedin.png] [imag= e: >>> facebook.png] [image: twitter.png] >>> [image: g+.png] >>> >>> >>> >>> >>> >>> DataStax is the fastest, most scalable distributed database technology, >>> delivering Apache Cassandra to the world=E2=80=99s most innovative ente= rprises. >>> Datastax is built to be agile, always-on, and predictably scalable to a= ny >>> size. With more than 500 customers in 45 countries, DataStax is the >>> database technology and transactional backbone of choice for the worlds >>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>> >>> On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar < >>> kgangakhedkar@gmail.com> wrote: >>> >>>> Thanks, Sebastian. >>>> >>>> Couple of questions (I'm really new to cassandra): >>>> 1. How do I interpret the output of 'nodetool cfstats' to figure out >>>> the issues? Any documentation pointer on that would be helpful. >>>> >>>> 2. I'm primarily a python/c developer - so, totally clueless about JVM >>>> environment. So, please bare with me as I would need a lot of hand-hol= ding. >>>> Should I just copy+paste the settings you gave and try to restart the >>>> failing cassandra server? >>>> >>>> Thanks, >>>> Kunal >>>> >>>> On 10 July 2015 at 22:35, Sebastian Estevez < >>>> sebastian.estevez@datastax.com> wrote: >>>> >>>>> #1 You need more information. >>>>> >>>>> a) Take a look at your .hprof file (memory heap from the OOM) with an >>>>> introspection tool like jhat or visualvm or java flight recorder and = see >>>>> what is using up your RAM. >>>>> >>>>> b) How big are your large rows (use nodetool cfstats on each node). I= f >>>>> your data model is bad, you are going to have to re-design it no matt= er >>>>> what. >>>>> >>>>> #2 As a possible workaround try using the G1GC allocator with the >>>>> settings from c* 3.0 instead of CMS. I've seen lots of success with i= t >>>>> lately (tl;dr G1GC is much simpler than CMS and almost as good as a f= inely >>>>> tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do >>>>> *not* set the newgen size for G1 sets it dynamically: >>>>> >>>>> # min and max heap sizes should be set to the same value to avoid >>>>>> # stop-the-world GC pauses during resize, and so that we can lock th= e >>>>>> # heap in memory on startup to prevent any of it from being swapped >>>>>> # out. >>>>>> JVM_OPTS=3D"$JVM_OPTS -Xms${MAX_HEAP_SIZE}" >>>>>> JVM_OPTS=3D"$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" >>>>>> >>>>>> # Per-thread stack size. >>>>>> JVM_OPTS=3D"$JVM_OPTS -Xss256k" >>>>>> >>>>>> # Use the Hotspot garbage-first collector. >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:+UseG1GC" >>>>>> >>>>>> # Have the JVM do less remembered set work during STW, instead >>>>>> # preferring concurrent GC. Reduces p99.9 latency. >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=3D5" >>>>>> >>>>>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. >>>>>> # Machines with > 10 cores may need additional threads. >>>>>> # Increase to <=3D full cores (do not count HT cores). >>>>>> #JVM_OPTS=3D"$JVM_OPTS -XX:ParallelGCThreads=3D16" >>>>>> #JVM_OPTS=3D"$JVM_OPTS -XX:ConcGCThreads=3D16" >>>>>> >>>>>> # Main G1GC tunable: lowering the pause target will lower throughput >>>>>> and vise versa. >>>>>> # 200ms is the JVM default and lowest viable setting >>>>>> # 1000ms increases throughput. Keep it smaller than the timeouts in >>>>>> cassandra.yaml. >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:MaxGCPauseMillis=3D500" >>>>>> # Do reference processing in parallel GC. >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:+ParallelRefProcEnabled" >>>>>> >>>>>> # This may help eliminate STW. >>>>>> # The default in Hotspot 8u40 is 40%. >>>>>> #JVM_OPTS=3D"$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=3D25" >>>>>> >>>>>> # For workloads that do large allocations, increasing the region >>>>>> # size may make things more efficient. Otherwise, let the JVM >>>>>> # set this automatically. >>>>>> #JVM_OPTS=3D"$JVM_OPTS -XX:G1HeapRegionSize=3D32m" >>>>>> >>>>>> # Make sure all memory is faulted and zeroed on startup. >>>>>> # This helps prevent soft faults in containers and makes >>>>>> # transparent hugepage allocation more effective. >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:+AlwaysPreTouch" >>>>>> >>>>>> # Biased locking does not benefit Cassandra. >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:-UseBiasedLocking" >>>>>> >>>>>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410= ) >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:StringTableSize=3D1000003" >>>>>> >>>>>> # Enable thread-local allocation blocks and allow the JVM to >>>>>> automatically >>>>>> # resize them at runtime. >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB" >>>>>> >>>>>> # http://www.evanjones.ca/jvm-mmap-pause.html >>>>>> JVM_OPTS=3D"$JVM_OPTS -XX:+PerfDisableSharedMem" >>>>> >>>>> >>>>> All the best, >>>>> >>>>> >>>>> [image: datastax_logo.png] >>>>> >>>>> Sebasti=C3=A1n Est=C3=A9vez >>>>> >>>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com >>>>> >>>>> [image: linkedin.png] [im= age: >>>>> facebook.png] [image: twitter.png= ] >>>>> [image: g+.png] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> DataStax is the fastest, most scalable distributed database >>>>> technology, delivering Apache Cassandra to the world=E2=80=99s most i= nnovative >>>>> enterprises. Datastax is built to be agile, always-on, and predictabl= y >>>>> scalable to any size. With more than 500 customers in 45 countries, D= ataStax >>>>> is the database technology and transactional backbone of choice for t= he >>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and = eBay. >>>>> >>>>> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar < >>>>> kgangakhedkar@gmail.com> wrote: >>>>> >>>>>> I upgraded my instance from 8GB to a 14GB one. >>>>>> Allocated 8GB to jvm heap in cassandra-env.sh. >>>>>> >>>>>> And now, it crashes even faster with an OOM.. >>>>>> >>>>>> Earlier, with 4GB heap, I could go upto ~90% replication completion >>>>>> (as reported by nodetool netstats); now, with 8GB heap, I cannot eve= n get >>>>>> there. I've already restarted cassandra service 4 times with 8GB hea= p. >>>>>> >>>>>> No clue what's going on.. :( >>>>>> >>>>>> Kunal >>>>>> >>>>>> On 10 July 2015 at 17:45, Jack Krupansky >>>>>> wrote: >>>>>> >>>>>>> You, and only you, are responsible for knowing your data and data >>>>>>> model. >>>>>>> >>>>>>> If columns per row or rows per partition can be large, then an 8GB >>>>>>> system is probably too small. But the real issue is that you need t= o keep >>>>>>> your partition size from getting too large. >>>>>>> >>>>>>> Generally, an 8GB system is okay, but only for reasonably-sized >>>>>>> partitions, like under 10MB. >>>>>>> >>>>>>> >>>>>>> -- Jack Krupansky >>>>>>> >>>>>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar < >>>>>>> kgangakhedkar@gmail.com> wrote: >>>>>>> >>>>>>>> I'm new to cassandra >>>>>>>> How do I find those out? - mainly, the partition params that you >>>>>>>> asked for. Others, I think I can figure out. >>>>>>>> >>>>>>>> We don't have any large objects/blobs in the column values - it's >>>>>>>> all textual, date-time, numeric and uuid data. >>>>>>>> >>>>>>>> We use cassandra to primarily store segmentation data - with >>>>>>>> segment type as partition key. That is again divided into two sepa= rate >>>>>>>> column families; but they have similar structure. >>>>>>>> >>>>>>>> Columns per row can be fairly large - each segment type as the row >>>>>>>> key and associated user ids and timestamp as column value. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kunal >>>>>>>> >>>>>>>> On 10 July 2015 at 16:36, Jack Krupansky >>>>>>>> wrote: >>>>>>>> >>>>>>>>> What does your data and data model look like - partition size, >>>>>>>>> rows per partition, number of columns per row, any large values/b= lobs in >>>>>>>>> column values? >>>>>>>>> >>>>>>>>> You could run fine on an 8GB system, but only if your rows and >>>>>>>>> partitions are reasonably small. Any large partitions could blow = you away. >>>>>>>>> >>>>>>>>> -- Jack Krupansky >>>>>>>>> >>>>>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar < >>>>>>>>> kgangakhedkar@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Attaching the stack dump captured from the last OOM. >>>>>>>>>> >>>>>>>>>> Kunal >>>>>>>>>> >>>>>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar < >>>>>>>>>> kgangakhedkar@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Forgot to mention: the data size is not that big - it's barely >>>>>>>>>>> 10GB in all. >>>>>>>>>>> >>>>>>>>>>> Kunal >>>>>>>>>>> >>>>>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar < >>>>>>>>>>> kgangakhedkar@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I have a 2 node setup on Azure (east us region) running Ubuntu >>>>>>>>>>>> server 14.04LTS. >>>>>>>>>>>> Both nodes have 8GB RAM. >>>>>>>>>>>> >>>>>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying t= o >>>>>>>>>>>> add a replacement node with same configuration. >>>>>>>>>>>> >>>>>>>>>>>> The problem is this new node also keeps dying with OOM - I've >>>>>>>>>>>> restarted the cassandra service like 8-10 times hoping that it= would finish >>>>>>>>>>>> the replication. But it didn't help. >>>>>>>>>>>> >>>>>>>>>>>> The one node that is still up is happily chugging along. >>>>>>>>>>>> All nodes have similar configuration - with libjna installed. >>>>>>>>>>>> >>>>>>>>>>>> Cassandra is installed from datastax's debian repo - pkg: dsc2= 1 >>>>>>>>>>>> version 2.1.7. >>>>>>>>>>>> I started off with the default configuration - i.e. the defaul= t >>>>>>>>>>>> cassandra-env.sh - which calculates the heap size automaticall= y (1/4 * RAM >>>>>>>>>>>> =3D 2GB) >>>>>>>>>>>> >>>>>>>>>>>> But, that didn't help. So, I then tried to increase the heap t= o >>>>>>>>>>>> 4GB manually and restarted. It still keeps crashing. >>>>>>>>>>>> >>>>>>>>>>>> Any clue as to why it's happening? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Kunal >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --001a113abcaeeaa098051a89c2d1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
And here is my cassandra-env.sh

Kunal

On 11 July 2015 at 00:04, Kunal Gangakhedkar= <kgangakhedkar@gmail.com> wrote:
From jhat output, top 10 entries for "I= nstance Count for All Classes (excluding platform)" shows:

2088223=C2=A0instances=C2=A0of=C2=A0class org.apac= he.cassandra.db.BufferCell=C2= =A0
1983245=C2=A0instances=C2=A0of=C2=A0class org.apache.cassandra.= db.composites.CompoundSparseCellName=C2=A0
1885974=C2=A0instances=C2=A0of=C2=A0class org.apache.c= assandra.db.composites.CompoundDenseCellName=C2=A0
= 630000=C2=A0instances=C2=A0of=C2=A0class org.a= pache.cassandra.io.sstable.IndexHelper$IndexInfo=C2=A0
503687=C2=A0instanc= es=C2=A0of=C2=A0class o= rg.apache.cassandra.db.BufferDeletedCell=C2=A0
378206=C2=A0instances=C2=A0of=C2=A0class org.apach= e.cassandra.cql3.ColumnIdentifier=C2=A0
101800=C2=A0instances=C2=A0of=C2=A0class org.apache.cas= sandra.utils.concurrent.Ref=C2= =A0
101800=C2=A0instances=C2=A0of=C2=A0class org.apache.cassandra.u= tils.concurrent.Ref$State=C2= =A0
90704=C2=A0instances=C2=A0of=C2=A0class org.apache.cassan= dra.utils.concurrent.Ref$GlobalState=C2=A0
71123=C2=A0instances=C2=A0of=C2=A0class org.apache.cas= sandra.db.BufferDecoratedKey= =C2=A0

<= div>At the bottom of the page, it shows:=C2=A0=

Total of 8739510 instances occupying 193607512 bytes.
JFYI.

Kunal
=

On 10 July 2015 at 23:49, Kunal Gangakhedkar= <kgangakhedkar@gmail.com> wrote:
Thanks for quick reply.
1. I don't know what are the thresholds that I should look for.= So, to save this back-and-forth, I'm attaching the cfstats output for = the keyspace.

There is one table - daily_challenges - whi= ch shows compacted partition max bytes as ~460M and another one - daily_gue= st_logins - which shows compacted partition max bytes as ~36M.

Can that be a problem?
Here is the CQL schema for the = daily_challenges column family:

CREATE TABLE app_10001.daily_challen= ges (
=C2=A0=C2=A0=C2=A0 segment_type text,
=C2=A0=C2=A0=C2=A0 date t= imestamp,
=C2=A0=C2=A0=C2=A0 user_id int,
=C2=A0=C2=A0=C2=A0 sess_id = text,
=C2=A0=C2=A0=C2=A0 data text,
=C2=A0=C2=A0=C2=A0 deleted boolea= n,
=C2=A0=C2=A0=C2=A0 PRIMARY KEY (segment_type, date, user_id, sess_id)=
) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
=C2= =A0=C2=A0=C2=A0 AND bloom_filter_fp_chance =3D 0.01
=C2=A0=C2=A0=C2=A0 A= ND caching =3D '{"keys":"ALL", "rows_per_parti= tion":"NONE"}'
=C2=A0=C2=A0=C2=A0 AND comment =3D = 9;'
=C2=A0=C2=A0=C2=A0 AND compaction =3D {'min_threshold': = '4', 'class': 'org.apache.cassandra.db.compaction.SizeT= ieredCompactionStrategy', 'max_threshold': '32'}
=C2= =A0=C2=A0=C2=A0 AND compression =3D {'sstable_compression': 'or= g.apache.cassandra.io.compress.LZ4Compressor'}
=C2=A0=C2=A0=C2=A0 AN= D dclocal_read_repair_chance =3D 0.1
=C2=A0=C2=A0=C2=A0 AND default_time= _to_live =3D 0
=C2=A0=C2=A0=C2=A0 AND gc_grace_seconds =3D 864000
=C2= =A0=C2=A0=C2=A0 AND max_index_interval =3D 2048
=C2=A0=C2=A0=C2=A0 AND m= emtable_flush_period_in_ms =3D 0
=C2=A0=C2=A0=C2=A0 AND min_index_interv= al =3D 128
=C2=A0=C2=A0=C2=A0 AND read_repair_chance =3D 0.0
=C2=A0= =C2=A0=C2=A0 AND speculative_retry =3D '99.0PERCENTILE';

CRE= ATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);


2. I don't know - how do I check? As I mentioned, I jus= t installed the dsc21 update from datastax's debian repo (ver 2.1.7).
Really appreciate your help.

=
Thanks,
Kunal

On 10 July 2015 at 23:33, Sebastian Estevez = <sebastian.estevez@datastax.com> wrote:
1. You want to look at # of sst= ables in cfhistograms or in cfstats look at:
Compacted partition maximu= m bytes
Maximum live cells per slice

2) No, here's the env.sh from 3.0 which should work with some tweaks:<= /div>

You'll at least= have to modify the jamm version to what's in yours. I think it's 2= .5



=

All the best= ,


3D"dat=

Sebasti=C3=A1n Est= =C3=A9vez

Solutions Architect | 954 905 8615 | sebastian.estevez@= datastax.com

3D"linkedin.png" 3D"facebook.png" 3D"g+.png"


<= /span>

DataStax is the = fastest, most scalable di= stributed database technology, delivering Apache Cassandra to the world=E2= =80=99s most innovative enterprises. Datastax is built to be agile, always-= on, and predictably scalable to any size. With more than 500 customers in 4= 5 countries, DataStax is the database technology = and transactional backbone of choice for the worlds most innovative compani= es such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jul 10, 2015 at 1:4= 2 PM, Kunal Gangakhedkar <kgangakhedkar@gmail.com> wro= te:
= Thanks, Sebastian.

Couple of questions (I'm really new to = cassandra):
1. How do I interpret the output of 'nodetool cfst= ats' to figure out the issues? Any documentation pointer on that would = be helpful.

2. I'm primarily a python/c developer - so, to= tally clueless about JVM environment. So, please bare with me as I would ne= ed a lot of hand-holding.
Should I just copy+paste the settings you gave= and try to restart the failing cassandra server?

Thanks,
=
Kunal

On 10 July 2015 at 22:35, Sebastian Estevez = <sebastian.estevez@datastax.com> wrote:
#1 You need more information.= =C2=A0

a) Take a look at your .hprof file (memory heap f= rom the OOM) with an introspection tool like jhat or visualvm or java fligh= t recorder and see what is using up your RAM.

b) How big= are your large rows (use nodetool cfstats on each node). If your data mode= l is bad, you are going to have to re-design it no matter what.
#2 As a possible workaround try using the G1GC allocator with t= he settings from c* 3.0 instead of CMS. I've seen lots of success with = it lately (tl;dr G1GC is much simpler than CMS and almost as good as a fine= ly tuned CMS). Note: Use it with the latest Java 8 from Oracle. Do <= b>not set the newgen size for G1 sets it dynamically:

# min and max heap sizes should be set to the same = value to avoid
# stop-the-world GC pauses during resize, and so th= at we can lock the
# heap in memory on startup to prevent any of i= t from being swapped
# out.
JVM_OPTS=3D"$JVM_= OPTS -Xms${MAX_HEAP_SIZE}"
JVM_OPTS=3D"$JVM_OPTS -X= mx${MAX_HEAP_SIZE}"
=C2=A0
# Per-thread stack size.
JVM_OPTS=3D"$JVM_OPTS -Xss256k"
=C2=A0
# Use = the Hotspot garbage-first collector.
JVM_OPTS=3D"$JVM_= OPTS -XX:+UseG1GC"
=C2=A0
# Have the JVM do less remembere= d set work during STW, instead
# preferring concurrent GC. Reduces= p99.9 latency.
JVM_OPTS=3D"$JVM_OPTS -XX:G1RSetUpdating= PauseTimePercent=3D5"
=C2=A0
# The JVM maximum is 8 PGC th= reads and 1/4 of that for ConcGC.
# Machines with > 10 cores ma= y need additional threads.
# Increase to <=3D full cores (do no= t count HT cores).
#JVM_OPTS=3D"$JVM_OPTS -XX:ParallelGCThrea= ds=3D16"
#JVM_OPTS=3D"$JVM_OPTS -XX:ConcGCThreads=3D16&q= uot;
=C2=A0
# Main G1GC tunable: lowering the pause target will= lower throughput and vise versa.
# 200ms is the JVM default and l= owest viable setting
# 1000ms increases throughput. Keep it smalle= r than the timeouts in cassandra.yaml.
JVM_OPTS=3D"$JVM_= OPTS -XX:MaxGCPauseMillis=3D500"
# Do reference processing in= parallel GC.
JVM_OPTS=3D"$JVM_OPTS -XX:+ParallelRefProc= Enabled"
=C2=A0
# This may help eliminate STW.
<= code style=3D"font-family:Consolas,'Bitstream Vera Sans Mono','= Courier New',Courier,monospace!important;border-radius:0px!important;bo= rder:0px!important;float:none!important;min-height:auto!important;margin:0p= x!important;outline:0px!important;overflow:visible!important;padding:0px!im= portant;vertical-align:baseline!important;width:auto!important;min-height:i= nherit!important;color:rgb(0,130,0)!important;background:none!important"># = The default in Hotspot 8u40 is 40%.
#JVM_OPTS=3D"$JVM_OPTS -X= X:InitiatingHeapOccupancyPercent=3D25"
=C2=A0
# For wo= rkloads that do large allocations, increasing the region
# size ma= y make things more efficient. Otherwise, let the JVM
# set this a= utomatically.
#JVM_OPTS=3D"$JVM_OPTS -XX:G1HeapRegionSize=3D3= 2m"
=C2=A0
# Make sure all memory is faulted and zeroed on= startup.
# This helps prevent soft faults in containers and makes=
# transparent hugepage allocation more effective.
JVM_OPTS= =3D"$JVM_OPTS -XX:+AlwaysPreTouch"
=C2=A0
# Bi= ased locking does not benefit Cassandra.
JVM_OPTS=3D"$JV= M_OPTS -XX:-UseBiasedLocking"
=C2=A0
# Larger interned str= ing table, for gossip's benefit (CASSANDRA-6410)
JVM_OPTS=3D"$JVM_OPTS -XX:StringTableSize=3D1000003"
=C2=A0= # Enable thread-local allocation blocks and allow the JVM to automatically<= br># resize them at runtime.
JVM_OPTS=3D"$JVM_OPTS= -XX:+UseTLAB -XX:+ResizeTLAB"
=C2=A0
#=C2=A0http://www.evanjones.ca/jvm-mmap-pause.= html
JVM_OPTS=3D"$JVM_OPTS -XX:+PerfDisableSharedMem= "

=

All the best= ,


3D"dat=

Sebasti=C3=A1n Est= =C3=A9vez

Solutions Architect | 954 905 8615 | sebastian.este= vez@datastax.com

3D"linkedin.png" 3D"facebook.png" 3D"twitter.png" 3D"g+.png"
<= /p>


DataStax is the fastest, most sca= lable distributed database technology, delivering Apache Cassandra to the w= orld=E2=80=99s most innovative enterprises. Datastax is built to be agile, = always-on, and predictably scalable to any size. With more than 500 custome= rs in 45 countries, DataStax is the database tech= nology and transactional backbone of choice for the worlds most innovative = companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gang= akhedkar <kgangakhedkar@gmail.com> wrote:
I upgraded my instanc= e from 8GB to a 14GB one.
Allocated 8GB to jvm heap in cassandra-e= nv.sh.

And now, it crashes even faster with an OOM..

Earlier, with 4GB heap, I could go upto ~90% replication completion (as= reported by nodetool netstats); now, with 8GB heap, I cannot even get ther= e. I've already restarted cassandra service 4 times with 8GB heap.
<= div class=3D"gmail_extra">
No clue what= 's going on.. :(
<= div class=3D"gmail_extra">
<= div>
Kunal

On 10 July 2015 at 17:45, Jack Krupansky <jack.krupansky@gmail.com> wrote:
You, and only you, are responsible for knowi= ng your data and data model.

If columns per row or rows = per partition can be large, then an 8GB system is probably too small. But t= he real issue is that you need to keep your partition size from getting too= large.

Generally, an 8GB system is okay, but only for r= easonably-sized partitions, like under 10MB.


--= Jack Krupansky

On Fri, Jul 10, 2015 at 8:05 AM, Kunal Ganga= khedkar <kgangakhedkar@gmail.com> wrote:
I'm new = to cassandra
How do I find those out? - mainly, the partition params tha= t you asked for. Others, I think I can figure out.

We do= n't have any large objects/blobs in the column values - it's all te= xtual, date-time, numeric and uuid data.

We use cassandra to p= rimarily store segmentation data - with segment type as partition key. That= is again divided into two separate column families; but they have similar = structure.

Columns per row can be fairly large - each segment = type as the row key and associated user ids and timestamp as column value.<= br>
Thanks,
Kunal
=

On 10 July 2015 at 16:36, Jack Krupansky <jack.krupansky@gmail.com> wrote:
What does your data and data model look like= - partition size, rows per partition, number of columns per row, any large= values/blobs in column values?

You could run fine on an= 8GB system, but only if your rows and partitions are reasonably small. Any= large partitions could blow you away.

-- Jack Krupansky

On Fri, Jul 10, 2015 at 4:22 AM, Kunal Ganga= khedkar <kgangakhedkar@gmail.com> wrote:
Attaching the stack dump captured fro= m the last OOM.

<= div>Kunal

On 10 July 2015 at 13:32, Kunal Gangakhedkar= <kgangakhedkar@gmail.com> wrote:
Forgot to mention: the data size is not that= big - it's barely 10GB in all.

Kunal

On 10 July 2015 at 13:29, Kunal Gangakhedkar= <kgangakhedkar@gmail.com> wrote:
Hi,<= br>
I have a 2 node setup on Azure (east us region) running Ubuntu= server 14.04LTS.
Both nodes have 8GB RAM.

One o= f the nodes (seed node) died with OOM - so, I am trying to add a replacemen= t node with same configuration.

The problem is this new node a= lso keeps dying with OOM - I've restarted the cassandra service like 8-= 10 times hoping that it would finish the replication. But it didn't hel= p.

The one node that is still up is happily chugging along.
All nodes have similar configuration - with libjna installed.
Cassandra is installed from datastax's debian repo - pkg: d= sc21 version 2.1.7.
I started off with the default configuration -= i.e. the default cassandra-env.sh - which calculates the heap size automat= ically (1/4 * RAM =3D 2GB)

But, that didn't help. So, I th= en tried to increase the heap to 4GB manually and restarted. It still keeps= crashing.

Any clue as to why it's happening?

Thanks,
Kunal












--001a113abcaeeaa098051a89c2d1--