incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: Cassandra OOM on repair.
Date Sun, 17 Jul 2011 18:27:49 GMT
Looks like problem in code:

    public IndexSummary(long expectedKeys)
    {
        long expectedEntries = expectedKeys /
DatabaseDescriptor.getIndexInterval();
        if (expectedEntries > Integer.MAX_VALUE)
            // TODO: that's a _lot_ of keys, or a very low interval
            throw new RuntimeException("Cannot use index_interval of " +
DatabaseDescriptor.getIndexInterval() + " with " + expectedKeys + "
(expected) keys.");
        indexPositions = new ArrayList<KeyPosition>((int)expectedEntries);
    }

I have too many keys, and too small index interval.

To fix this, I can:
1) reduce number of keys - rewrite app and sacrifice balance
2) increase index_interval - hurt another column families

A question:
Are there any drawbacks for using different indexInterval for column
families
in keyspace? (suppose I'll write a patch)

2011/7/15 Andrey Stepachev <octo47@gmail.com>

> Looks like key indexes eat all memory:
>
> http://paste.kde.org/97213/
>
>
> 2011/7/15 Andrey Stepachev <octo47@gmail.com>
>
>> UPDATE:
>>
>> I found, that
>> a) with min10G cassandra survive.
>> b) I have ~1000 sstables
>> c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow
>>
>> So, I have a question:
>> a) if row is bigger then 64mb before compaction, why it compacted in
>> memory
>> b) if it smaller, what eats so much memory?
>>
>> 2011/7/15 Andrey Stepachev <octo47@gmail.com>
>>
>>> Hi all.
>>>
>>> Cassandra constantly OOM on repair or compaction. Increasing memory
>>> doesn't help (6G)
>>> I can give more, but I think that this is not a regular situation.
>>> Cluster has 4 nodes. RF=3.
>>> Cassandra version 0.8.1
>>>
>>> Ring looks like this:
>>>  Address         DC          Rack        Status State   Load
>>>  Owns    Token
>>>
>>>      127605887595351923798765477786913079296
>>> xxx.xxx.xxx.66  datacenter1 rack1       Up     Normal  176.96 GB
>>> 25.00%  0
>>> xxx.xxx.xxx.69  datacenter1 rack1       Up     Normal  178.19 GB
>>> 25.00%  42535295865117307932921825928971026432
>>> xxx.xxx.xxx.67  datacenter1 rack1       Up     Normal  178.26 GB
>>> 25.00%  85070591730234615865843651857942052864
>>> xxx.xxx.xxx.68  datacenter1 rack1       Up     Normal  175.2 GB
>>>  25.00%  127605887595351923798765477786913079296
>>>
>>> About schema:
>>> I have big rows (>100k, up to several millions). But as I know, it is
>>> normal for cassandra.
>>> All things work relatively good, until I start long running
>>> pre-production tests. I load
>>> data and after a while (~4hours) cluster begin timeout and them some
>>> nodes die with OOM.
>>> My app retries to send, so after short period all nodes becomes down.
>>> Very nasty.
>>>
>>> But now, I can OOM nodes by simple call nodetool repair.
>>> In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to
>>> upper limit.
>>> cfstats shows: http://paste.kde.org/96817/
>>> config is: http://paste.kde.org/96823/
>>> A question is: does anybody knows, what this means. Why cassandra tries
>>> to load
>>> something big into memory at once?
>>>
>>> A.
>>>
>>
>>
>

Mime
View raw message