incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Forkalsrud <eforkals...@cj.com>
Subject Re: FW: Very slow batch insert using version 0.7.2
Date Fri, 11 Mar 2011 22:14:39 GMT
On 03/11/2011 12:13 PM, Jonathan Ellis wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-2158, fixed in 0.7.3
>
> you could have saved a lot of time just by upgrading first. :)


It looks like the fix isn't entirely correct.  The bug is still in 
0.7.3.   In Memtable.java, the line:

    THRESHOLD = cfs.getMemtableThroughputInMB() * 1024 * 1024;

should be changed to:

    THRESHOLD = cfs.getMemtableThroughputInMB() * 1024L * 1024L;


Here's some code that illustrates the difference:

     public void testMultiplication() {
         int memtableThroughputInMB = 2300;
         long thresholdA = memtableThroughputInMB * 1024 * 1024;
         long thresholdB = memtableThroughputInMB * 1024L * 1024L;
         System.out.println("a=" + thresholdA + " b=" + thresholdB);
     }



- Erik -


> On Fri, Mar 11, 2011 at 2:02 PM, Erik Forkalsrud<eforkalsrud@cj.com>  wrote:
>> On 03/11/2011 04:56 AM, Zhu Han wrote:
>>> When I run it on my laptop (Fedora 14, 64-bit, 4 cores, 8GB RAM)  it
>>> flushes one Memtable with 5000 operations
>>> When I run it on a server  (RHEL5, 64-bit, 16 cores, 96GB RAM) it flushes
>>> 100 Memtables with anywhere between 1 operation and 359 operations (35 bytes
>>> and 12499 bytes)
>> What's the settings of commit log flush, periodic or in batch?
>>
>>
>> It's whatever the default setting is, (in the cassandra.yaml that is
>> packaged in the apache-cassandra-0.7.3-bin.tar.gz download) specifically:
>>
>>     commitlog_rotation_threshold_in_mb: 128
>>     commitlog_sync: periodic
>>     commitlog_sync_period_in_ms: 10000
>>     flush_largest_memtables_at: 0.75
>>
>> If I describe keyspace I get:
>>
>>     [default@unknown] describe keyspace Events;
>>     Keyspace: Events:
>>       Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>>         Replication Factor: 1
>>       Column Families:
>>         ColumnFamily: Event
>>           Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>>           Row cache size / save period: 0.0/0
>>           Key cache size / save period: 200000.0/14400
>>           Memtable thresholds: 14.109375/3010/1440
>>           GC grace seconds: 864000
>>           Compaction min/max thresholds: 4/32
>>           Read repair chance: 1.0
>>           Built indexes: []
>>
>>
>> It turns out my suspicion was right. When I tried overriding the jvm memory
>> parameters calculated in conf/cassandra-env.sh to use the values calculated
>> on my 8GB laptop like this:
>>
>>     MAX_HEAP_SIZE=3932m HEAP_NEWSIZE=400m ./mutate.sh
>>
>> That made the server behave much nicer.  This time it kept all 5000
>> operations in a single Memtable.  Also, when running with these memory
>> settings the Memtable thresholds changed to "1.1390625/243/1440"   (from
>> "14.109375/3010/1440")      (all the other output from "describe keyspace"
>> remains the same)
>>
>> So it looks like something goes wrong when cassandra gets too much memory.
>>
>>
>> --
>> Erik Forkalsrud
>> Commission Junstion
>>
>>
>
>


Mime
View raw message