incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: FW: Very slow batch insert using version 0.7.2
Date Sat, 12 Mar 2011 00:09:23 GMT
Absolutely right!  Thanks, fixed for 0.7.4.

On Fri, Mar 11, 2011 at 4:14 PM, Erik Forkalsrud <eforkalsrud@cj.com> wrote:
> On 03/11/2011 12:13 PM, Jonathan Ellis wrote:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2158, fixed in 0.7.3
>>
>> you could have saved a lot of time just by upgrading first. :)
>
>
> It looks like the fix isn't entirely correct.  The bug is still in 0.7.3.
> In Memtable.java, the line:
>
>   THRESHOLD = cfs.getMemtableThroughputInMB() * 1024 * 1024;
>
> should be changed to:
>
>   THRESHOLD = cfs.getMemtableThroughputInMB() * 1024L * 1024L;
>
>
> Here's some code that illustrates the difference:
>
>    public void testMultiplication() {
>        int memtableThroughputInMB = 2300;
>        long thresholdA = memtableThroughputInMB * 1024 * 1024;
>        long thresholdB = memtableThroughputInMB * 1024L * 1024L;
>        System.out.println("a=" + thresholdA + " b=" + thresholdB);
>    }
>
>
>
> - Erik -
>
>
>> On Fri, Mar 11, 2011 at 2:02 PM, Erik Forkalsrud<eforkalsrud@cj.com>
>>  wrote:
>>>
>>> On 03/11/2011 04:56 AM, Zhu Han wrote:
>>>>
>>>> When I run it on my laptop (Fedora 14, 64-bit, 4 cores, 8GB RAM)  it
>>>> flushes one Memtable with 5000 operations
>>>> When I run it on a server  (RHEL5, 64-bit, 16 cores, 96GB RAM) it
>>>> flushes
>>>> 100 Memtables with anywhere between 1 operation and 359 operations (35
>>>> bytes
>>>> and 12499 bytes)
>>>
>>> What's the settings of commit log flush, periodic or in batch?
>>>
>>>
>>> It's whatever the default setting is, (in the cassandra.yaml that is
>>> packaged in the apache-cassandra-0.7.3-bin.tar.gz download) specifically:
>>>
>>>    commitlog_rotation_threshold_in_mb: 128
>>>    commitlog_sync: periodic
>>>    commitlog_sync_period_in_ms: 10000
>>>    flush_largest_memtables_at: 0.75
>>>
>>> If I describe keyspace I get:
>>>
>>>    [default@unknown] describe keyspace Events;
>>>    Keyspace: Events:
>>>      Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>>>        Replication Factor: 1
>>>      Column Families:
>>>        ColumnFamily: Event
>>>          Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>>>          Row cache size / save period: 0.0/0
>>>          Key cache size / save period: 200000.0/14400
>>>          Memtable thresholds: 14.109375/3010/1440
>>>          GC grace seconds: 864000
>>>          Compaction min/max thresholds: 4/32
>>>          Read repair chance: 1.0
>>>          Built indexes: []
>>>
>>>
>>> It turns out my suspicion was right. When I tried overriding the jvm
>>> memory
>>> parameters calculated in conf/cassandra-env.sh to use the values
>>> calculated
>>> on my 8GB laptop like this:
>>>
>>>    MAX_HEAP_SIZE=3932m HEAP_NEWSIZE=400m ./mutate.sh
>>>
>>> That made the server behave much nicer.  This time it kept all 5000
>>> operations in a single Memtable.  Also, when running with these memory
>>> settings the Memtable thresholds changed to "1.1390625/243/1440"   (from
>>> "14.109375/3010/1440")      (all the other output from "describe
>>> keyspace"
>>> remains the same)
>>>
>>> So it looks like something goes wrong when cassandra gets too much
>>> memory.
>>>
>>>
>>> --
>>> Erik Forkalsrud
>>> Commission Junstion
>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message