Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 21035 invoked from network); 21 Apr 2010 16:54:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Apr 2010 16:54:41 -0000 Received: (qmail 69156 invoked by uid 500); 21 Apr 2010 16:54:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69139 invoked by uid 500); 21 Apr 2010 16:54:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69126 invoked by uid 99); 21 Apr 2010 16:54:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 16:54:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nithril@gmail.com designates 209.85.218.222 as permitted sender) Received: from [209.85.218.222] (HELO mail-bw0-f222.google.com) (209.85.218.222) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 16:54:35 +0000 Received: by bwz22 with SMTP id 22so7311859bwz.5 for ; Wed, 21 Apr 2010 09:54:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=Y0YUTLoEqWpUC3/Gb3fH1TllwpRuruDc8+WlHXeBRWE=; b=q18ztIqozX+ggNpN9OiBeFaOqmd8P7RGyqBjqizmF41OK2+U3fM/FIuD6KYTrsgqnt vPiDT6P51lx5V67nK37fvUkrlSPhekv/Z0iDD1V5QI85q6UgX19NiAs2Kb7E3H7LQEi/ c8KiiahXnUMY3iq20yXhkLZGUkKJWrXJnuR38= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=aGFyRwj4agumrWnp8Tl5hUv6sTvpJG+0YvlAP2O8zmMivXO91st9DfG1UmrlBJoVAI uTVCU7CK+sKC7ro2OE2buao8ml6H3d8uaC8dF2PbzuAjT1Codkbne6Bha/YbPJa94kjj 8E1dsvMHRM6mEO3ar2Lw0O+qe5vfONEHJncpI= MIME-Version: 1.0 Received: by 10.204.99.143 with HTTP; Wed, 21 Apr 2010 09:54:13 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Apr 2010 18:54:13 +0200 Received: by 10.204.141.78 with SMTP id l14mr2996582bku.85.1271868853258; Wed, 21 Apr 2010 09:54:13 -0700 (PDT) Message-ID: Subject: Re: Cassandra tuning for running test on a desktop From: Nicolas Labrot To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001517478d96330d290484c20b29 --001517478d96330d290484c20b29 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I donnot have a website ;) I'm testing the viability of Cassandra to store XML documents and make fast search queries. 4000 XML files (80MB of XML) create with my datamodel (one SC per XML node) 1000000 SC which make Cassandra go OOM with Xmx 1GB. On th= e contrary an xml DB like eXist handles 4000 XML doc without any problem with an acceptable amount of memories. What I like with Cassandra is his simplicity and his scalability. eXist is not able to scale with data, the only viable solution his marklogic which cost an harm and a feet... :) I will install linux and buy some memories to continue my test. Could a Cassandra developper give me the technical reason of this OOM ? On Wed, Apr 21, 2010 at 5:13 PM, Mark Greene wrote: > Maybe, maybe not. Presumably if you are running a RDMS with any reasonabl= e > amount of traffic now a days, it's sitting on a machine with 4-8G of memo= ry > at least. > > > On Wed, Apr 21, 2010 at 10:48 AM, Nicolas Labrot wrote= : > >> Thanks Mark. >> >> Cassandra is maybe too much for my need ;) >> >> >> >> On Wed, Apr 21, 2010 at 4:45 PM, Mark Greene wrote: >> >>> Hit send to early.... >>> >>> That being said a lot of people running Cassandra in production are usi= ng >>> 4-6GB max heaps on 8GB machines, don't know if that helps but hopefully >>> gives you some perspective. >>> >>> >>> On Wed, Apr 21, 2010 at 10:39 AM, Mark Greene wrote= : >>> >>>> RAM doesn't necessarily need to be proportional but I would say the >>>> number of nodes does. You can't just throw a bazillion inserts at one = node. >>>> This is the main benefit of Cassandra is that if you start hitting you= r >>>> capacity, you add more machines and distribute the keys across more >>>> machines. >>>> >>>> >>>> On Wed, Apr 21, 2010 at 9:07 AM, Nicolas Labrot wro= te: >>>> >>>>> So does it means the RAM needed is proportionnal with the data handle= d >>>>> ? >>>>> >>>>> Or Cassandra need a minimum amount or RAM when dataset is big? >>>>> >>>>> I must confess this OOM behaviour is strange. >>>>> >>>>> >>>>> On Wed, Apr 21, 2010 at 2:54 PM, Mark Jones wro= te: >>>>> >>>>>> On my 4GB machine I=92m giving it 3GB and having no trouble with 60= + >>>>>> million 500 byte columns >>>>>> >>>>>> >>>>>> >>>>>> *From:* Nicolas Labrot [mailto:nithril@gmail.com] >>>>>> *Sent:* Wednesday, April 21, 2010 7:47 AM >>>>>> *To:* user@cassandra.apache.org >>>>>> *Subject:* Re: Cassandra tuning for running test on a desktop >>>>>> >>>>>> >>>>>> >>>>>> I have try 1400M, and Cassandra OOM too. >>>>>> >>>>>> Is there another solution ? My data isn't very big. >>>>>> >>>>>> It seems that is the merge of the db >>>>>> >>>>>> On Wed, Apr 21, 2010 at 2:14 PM, Mark Greene >>>>>> wrote: >>>>>> >>>>>> Trying increasing Xmx. 1G is probably not enough for the amount of >>>>>> inserts you are doing. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Apr 21, 2010 at 8:10 AM, Nicolas Labrot >>>>>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> For my first message I will first thanks Cassandra contributors for >>>>>> their great works. >>>>>> >>>>>> I have a parameter issue with Cassandra (I hope it's just a paramete= r >>>>>> issue). I'm using Cassandra 6.0.1 with Hector client on my desktop. = It's a >>>>>> simple dual core with 4GB of RAM on WinXP. I have keep the default J= VM >>>>>> option inside cassandra.bat (Xmx1G) >>>>>> >>>>>> I'm trying to insert 3 millions of SC with 6 Columns each inside 1 C= F >>>>>> (named Super1). The insertion go to 1 millions of SC (without slowdo= wn) and >>>>>> Cassandra crash because of an OOM. (I store an average of 100 bytes = per SC >>>>>> with a max of 10kB). >>>>>> I have aggressively decreased all the memories parameters without an= y >>>>>> respect to the consistency (My config is here [1]), the cache is tur= n off >>>>>> but Cassandra still go to OOM. I have joined the last line of the Ca= ssandra >>>>>> life [2]. >>>>>> >>>>>> What can I do to fix my issue ? Is there another solution than >>>>>> increasing the Xmx ? >>>>>> >>>>>> Thanks for your help, >>>>>> >>>>>> Nicolas >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [1] >>>>>> >>>>>> >>>>>> >>>>> ColumnType=3D"Super" >>>>>> CompareWith=3D"BytesType" >>>>>> CompareSubcolumnsWith=3D"BytesType" /> >>>>>> >>>>>> org.apache.cassandra.locator.RackUnawareSt= rategy >>>>>> 1 >>>>>> >>>>>> org.apache.cassandra.locator.EndPointSnitch >>>>>> >>>>>> >>>>>> 32 >>>>>> >>>>>> auto >>>>>> 64 >>>>>> 64 >>>>>> 16 >>>>>> 4 >>>>>> 64 >>>>>> >>>>>> 16 >>>>>> 32 >>>>>> 0.01 >>>>>> 0.01 >>>>>> 60 >>>>>> 4 >>>>>> 8 >>>>>> >>>>>> >>>>>> >>>>>> [2] >>>>>> INFO 13:36:41,062 Super1 has reached its threshold; switching in a >>>>>> fresh Memtable at >>>>>> CommitLogContext(file=3D'd:/cassandra/commitlog\CommitLog-1271849783= 703.log', >>>>>> position=3D5417524) >>>>>> INFO 13:36:41,062 Enqueuing flush of Memtable(Super1)@15385755 >>>>>> INFO 13:36:41,062 Writing Memtable(Super1)@15385755 >>>>>> INFO 13:36:42,062 Completed flushing >>>>>> d:\cassandra\data\Keyspace1\Super1-711-Data.db >>>>>> INFO 13:36:45,781 Super1 has reached its threshold; switching in a >>>>>> fresh Memtable at >>>>>> CommitLogContext(file=3D'd:/cassandra/commitlog\CommitLog-1271849783= 703.log', >>>>>> position=3D6065637) >>>>>> INFO 13:36:45,781 Enqueuing flush of Memtable(Super1)@15578910 >>>>>> INFO 13:36:45,796 Writing Memtable(Super1)@15578910 >>>>>> INFO 13:36:46,109 Completed flushing >>>>>> d:\cassandra\data\Keyspace1\Super1-712-Data.db >>>>>> INFO 13:36:54,296 GC for ConcurrentMarkSweep: 7149 ms, 58337240 >>>>>> reclaimed leaving 922392600 used; max is 1174208512 >>>>>> INFO 13:36:54,593 Super1 has reached its threshold; switching in a >>>>>> fresh Memtable at >>>>>> CommitLogContext(file=3D'd:/cassandra/commitlog\CommitLog-1271849783= 703.log', >>>>>> position=3D6722241) >>>>>> INFO 13:36:54,593 Enqueuing flush of Memtable(Super1)@24468872 >>>>>> INFO 13:36:54,593 Writing Memtable(Super1)@24468872 >>>>>> INFO 13:36:55,421 Completed flushing >>>>>> d:\cassandra\data\Keyspace1\Super1-713-Data.dbjava.lang.OutOfMemoryE= rror: >>>>>> Java heap space >>>>>> INFO 13:37:08,281 GC for ConcurrentMarkSweep: 5561 ms, 9432 reclaim= ed >>>>>> leaving 971904520 used; max is 1174208512 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> > --001517478d96330d290484c20b29 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I donnot have a website ;)

I'm testing the viability of Cassandr= a to store XML documents and make fast search queries. 4000 XML files (80MB= of XML) create with my datamodel (one SC per XML node) 1000000 SC which ma= ke Cassandra go OOM with Xmx 1GB. On the contrary an xml DB like eXist hand= les 4000 XML doc without any problem with an acceptable amount of memories.=

What I like with Cassandra is his simplicity and his scalability. eXist= is not able to scale with data, the only viable solution his marklogic whi= ch cost an harm and a feet... :)

I will install linux and buy some m= emories to continue my test.

Could a Cassandra developper give me the technical reason of this OOM ?=





On Wed, Apr 21, 2010 at = 5:13 PM, Mark Greene <greenemj@gmail.com> wrote:
Maybe, maybe not.= Presumably if you are running a RDMS with any reasonable amount of traffic= now a days, it's sitting on a machine with 4-8G of memory at least.=A0=


On Wed, Apr 21, 2010 at = 10:48 AM, Nicolas Labrot <nithril@gmail.com> wrote:
Thanks Mark.
<= br>Cassandra is maybe too much for my need ;)



On Wed, Apr 21, 2010 at 4:45 PM, Mark Greene <greenemj@gmail.com>= wrote:
Hit send to early= ....

That being said a lot of people running Cassandra in productio= n are using 4-6GB max heaps on 8GB machines, don't know if that helps b= ut hopefully gives you some perspective.


On Wed, Apr 21, 2010 at 10:39 AM, Mark Greene <greenemj@gmail.com>= wrote:
RAM doesn't necessarily need to be proportional but I would say the num= ber of nodes does. You can't just throw a bazillion inserts at one node= . This is the main benefit of Cassandra is that if you start hitting your c= apacity, you add more machines and distribute the keys=A0across=A0more mach= ines.


On Wed, Apr 21, 2010 at 9:07 AM, Nicolas Lab= rot <nithril@gmail.com> wrote:
So does it means the RAM needed is proportionnal with the data handled ?
Or Cassandra need a minimum amount or RAM when dataset is big?

= I must confess this OOM behaviour is strange.


On Wed, Apr 21, 2010 at 2:54 PM, Mark Jones <MJones@imagehawk.com= > wrote:

On my 4GB machine I=92m giving it 3GB and having no trouble with 60+ million 500 byte columns

=A0

From:= Nicolas Labrot [mailto:nithril@gmai= l.com]
Sent: Wednesday, April 21, 2010 7:47 AM
To: u= ser@cassandra.apache.org
Subject: Re: Cassandra tuning for running test on a desktop

=A0

I have try 1400M, and= Cassandra OOM too.

Is there another solution ? My data isn't very big.

It seems that is the merge of the db

On Wed, Apr 21, 2010 at 2:14 PM, Mark Greene <greenemj@gmail.com= > wrote:

Trying increasing Xmx. 1G is probably not enough for= the amount of inserts you are doing.

=A0

On Wed, Apr 21, 2010 at 8:10 AM, Nicolas Labrot <= nithril@gmail.com> wrote:

Hello,

For my first message I will first thanks Cassandra contributors for their g= reat works.

I have a parameter issue with Cassandra (I hope it's just a parameter i= ssue). I'm using Cassandra 6.0.1 with Hector client on my desktop. It's a = simple dual core with 4GB of RAM on WinXP. I have keep the default JVM option inside cassandra.bat (Xmx1G)

I'm trying to insert 3 millions of SC with 6 Columns each inside 1 CF (= named Super1). The insertion go to 1 millions of SC (without slowdown) and Cassan= dra crash because of an OOM. (I store an average of 100 bytes per SC with a max= of 10kB).
I have aggressively decreased all the memories parameters without any respe= ct to the consistency (My config is here [1]), the cache is turn off but Cassa= ndra still go to OOM. I have joined the last line of the Cassandra life [2].

What can I do to fix my issue ?=A0 Is there another solution than increasin= g the Xmx ?

Thanks for your help,

Nicolas





[1]
=A0 <Keyspaces>
=A0=A0=A0 <Keyspace Name=3D"Keyspace1">
=A0=A0=A0=A0=A0 <ColumnFamily Name=3D"Super1"
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ColumnType=3D"Super"
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 CompareWith=3D"BytesType"
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 CompareSubcolumnsWith=3D"BytesType" />
=A0=A0=A0=A0=A0 <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStr= ategy</ReplicaPlacementStrategy>
=A0=A0=A0=A0=A0 <ReplicationFactor>1</ReplicationFactor>
=A0=A0=A0=A0=A0 <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPo= intSnitch>
=A0=A0=A0 </Keyspace>
=A0 </Keyspaces>
=A0 <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB= >

=A0 <DiskAccessMode>auto</DiskAccessMode>
=A0 <RowWarningThresholdInMB>64</RowWarningThresholdInMB>
=A0 <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
=A0 <FlushDataBufferSizeInMB>16</FlushDataBufferSizeInMB>
=A0 <FlushIndexBufferSizeInMB>4</FlushIndexBufferSizeInMB>
=A0 <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>

=A0 <MemtableThroughputInMB>16</MemtableThroughputInMB>
=A0 <BinaryMemtableThroughputInMB>32</BinaryMemtableThroughputInMB>=
=A0 <MemtableOperationsInMillions>0.01</MemtableOperationsInMillions&g= t;
=A0 <MemtableObjectCountInMillions>0.01</MemtableObjectCountInMillions= >
=A0 <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes> =A0 <ConcurrentReads>4</ConcurrentReads>
=A0 <ConcurrentWrites>8</ConcurrentWrites>
</Storage>


[2]
=A0INFO 13:36:41,062 Super1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file=3D'd:/cassandra/commitlog\CommitLog-1= 271849783703.log', position=3D5417524)
=A0INFO 13:36:41,062 Enqueuing flush of Memtable(Super1)@15385755
=A0INFO 13:36:41,062 Writing Memtable(Super1)@15385755
=A0INFO 13:36:42,062 Completed flushing d:\cassandra\data\Keyspace1\Super1-= 711-Data.db
=A0INFO 13:36:45,781 Super1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file=3D'd:/cassandra/commitlog\CommitLog-1271849783703= .log', position=3D6065637)
=A0INFO 13:36:45,781 Enqueuing flush of Memtable(Super1)@15578910
=A0INFO 13:36:45,796 Writing Memtable(Super1)@15578910
=A0INFO 13:36:46,109 Completed flushing d:\cassandra\data\Keyspace1\Super1-712-Data.db
=A0INFO 13:36:54,296 GC for ConcurrentMarkSweep: 7149 ms, 58337240 reclaime= d leaving 922392600 used; max is 1174208512
=A0INFO 13:36:54,593 Super1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file=3D'd:/cassandra/commitlog\CommitLog-1271849783703= .log', position=3D6722241)
=A0INFO 13:36:54,593 Enqueuing flush of Memtable(Super1)@24468872
=A0INFO 13:36:54,593 Writing Memtable(Super1)@24468872
=A0INFO 13:36:55,421 Completed flushing d:\cassandra\data\Keyspace1\Super1-713-Data.dbjava.lang.OutOfMemoryError: J= ava heap space
=A0INFO 13:37:08,281 GC for ConcurrentMarkSweep: 5561 ms, 9432 reclaimed leaving 971904520 used; max is 1174208512

=A0

=A0







--001517478d96330d290484c20b29--