incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: OOM Exception
Date Thu, 17 Dec 2009 03:03:05 GMT
Yes, the only ones that can't easily be changed are the Partitioner
and the ReplicationStrategy.

On Wed, Dec 16, 2009 at 8:48 PM, Brian Burruss <bburruss@real.com> wrote:
> Glad to hear "that" bug is fixed ;)
>
> Can the configuration params like memtable size be changed between server starts without
clearing the data?
>
>
> Jonathan Ellis <jbellis@gmail.com> wrote:
>
>
> You're OOMing after log replay finishes there.  So I can still
> maintain that beta2 fixed the "replay uses more memory" bug :)
>
> It looks like you're running out of memory when the other node
> restarts, and it needs to read the hinted rows into memory to send
> them over.
>
> I suggest halving your MemtableSizeInMB, 1.5GB is pretty large.
>
> On Wed, Dec 16, 2009 at 7:01 PM, Brian Burruss <bburruss@real.com> wrote:
>> attached ... the log starts when i restarted server.  notice that not too far into
it is when the other node went down because of OOM and i restarted it as well.
>>
>> ________________________________________
>> From: Jonathan Ellis [jbellis@gmail.com]
>> Sent: Wednesday, December 16, 2009 4:53 PM
>> To: cassandra-user@incubator.apache.org
>> Subject: Re: OOM Exception
>>
>> sorry, i meant the system.log the 2nd time (clear it out before
>> replaying so it's not confused w/ other info, pls)
>>
>> On Wed, Dec 16, 2009 at 5:39 PM, Brian Burruss <bburruss@real.com> wrote:
>>> is this what you want?  they are big - i'd rather not spam everyone with them.
 if you need them or the hprof files i can tar them and send them to you.
>>>
>>> thx!
>>>
>>>
>>> [bburruss@gen-app02 cassandra]$ ls -l ~/cassandra/btoddb/commitlog/
>>> total 597228
>>> -rw-rw-r-- 1 bburruss bburruss 134219796 Dec 16 13:52 CommitLog-1260995895123.log
>>> -rw-rw-r-- 1 bburruss bburruss 134218547 Dec 16 13:52 CommitLog-1260997811317.log
>>> -rw-rw-r-- 1 bburruss bburruss 134218331 Dec 16 13:52 CommitLog-1260998497744.log
>>> -rw-rw-r-- 1 bburruss bburruss 134219677 Dec 16 13:53 CommitLog-1261000330587.log
>>> -rw-rw-r-- 1 bburruss bburruss  74055680 Dec 16 14:49 CommitLog-1261000439079.log
>>> [bburruss@gen-app02 cassandra]$
>>>
>>> ________________________________________
>>> From: Jonathan Ellis [jbellis@gmail.com]
>>> Sent: Wednesday, December 16, 2009 3:29 PM
>>> To: cassandra-user@incubator.apache.org
>>> Subject: Re: OOM Exception
>>>
>>> How large are the log files being replayed?
>>>
>>> Can you attach the log from a replay attempt?
>>>
>>> On Wed, Dec 16, 2009 at 5:21 PM, Brian Burruss <bburruss@real.com> wrote:
>>>> sorry, thought i included everything ;)
>>>>
>>>> however, i am using beta2
>>>>
>>>> ________________________________________
>>>> From: Jonathan Ellis [jbellis@gmail.com]
>>>> Sent: Wednesday, December 16, 2009 3:18 PM
>>>> To: cassandra-user@incubator.apache.org
>>>> Subject: Re: OOM Exception
>>>>
>>>> What version are you using?  0.5 beta2 fixes the
>>>> using-more-memory-on-startup problem.
>>>>
>>>> On Wed, Dec 16, 2009 at 5:16 PM, Brian Burruss <bburruss@real.com>
wrote:
>>>>> i'll put my question first:
>>>>>
>>>>> - how can i determine how much RAM is required by cassandra?  (for normal
operation and restarting server)
>>>>>
>>>>> *** i've attached my storage-conf.xml
>>>>>
>>>>> i've gotten several more OOM exceptions since i mentioned it a week or
so ago.  i started from a fresh database a couple days ago and have been adding 2k blocks
of data keyed off a random integer at the rate of about 400/sec.  i have a 2 node cluster,
RF=2, Consistency for read/write is ONE.  there are ~70,420,082 2k blocks of data in the
database.
>>>>>
>>>>> i used the default memory setup of Xmx1G when i started a couple days
ago.  as the database grew to ~180G (reported by unix du command) both servers OOM'ed at
about the same time, within 10 minutes of each other.  well needless to say, my cluster is
dead.  so i upped the memory to 3G and the servers tried to come back up, but one died again
with OOM.
>>>>>
>>>>> Before cleaning the disk and starting over a couple days ago, i played
the game of "jack up the RAM", but eventually i didn't want to up it anymore when i got to
5G.  the parameter, SSTable.INDEX_INTERVAL, was discussed a few days ago that would change
the number of "keys" cached in memory, so i could modify that at the cost of read performance,
but doing the math, 3G should be plenty of room.
>>>>>
>>>>> it seems like startup requires more RAM than just normal running.
>>>>>
>>>>> so this of course concerns me.
>>>>>
>>>>> i have the hprof files from when the server initially crashed and when
it crashed trying to restart if anyone wants them
>>>>>
>>>>
>>>
>>
>

Mime
View raw message