cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Configuration for large number of inserts
Date Thu, 24 Mar 2011 20:26:07 GMT
With only 2Gb heap you are pushing things a little, see
for guidelines on how to estimate the size. 

- turn off key and column caches
- consider reducing the memtable ops, AFAIK throughput in MB calculates using the col values
only. So if you have lots of small columns it will take a while to trigger. 
- check the min + max compaction settings sing JConsole or cassandra-cli. If compaction is
disabled you should see "Compaction is currently disabled." logged at DEBUG level
- throttle your app to allow GC to catch up 

Hope that helps. 

On 25 Mar 2011, at 00:17, Adam Briffett wrote:

> Hi,
> When doing bulk inserts of data using Pelops (~1000 million rows,
> column counts varying from 1 - 100,000 but more skewed towards fewer
> columns), we're ultimately getting a server OOME using 0.7.4. I've
> attempted to follow other pointers on this issue (reducing threshold
> before memtables flushed to disk, increasing heap space), turned off
> compaction (although it seems to still be happening), and also tried
> reducing the value of index_interval to avoid using up space.
> We're using a single box for this test with attached .yaml and a heap
> size of 2GB, we're also using a single keyspace and column family, the
> settings for which are below (we're creating it using Pelops rather
> than in the .yaml):
> cf.column_type = "Standard";
> cf.comparator_type = "UTF8Type";
> cf.key_cache_size = 200d;
> cf.row_cache_size = 16d;
> cf.memtable_throughput_in_mb = 128;
> cf.memtable_operations_in_millions = 0.3;
> cf.min_compaction_threshold = 0;
> cf.max_compaction_threshold = 0;
> One issue is that compaction still appears to be happening, as if I
> check using nodetool compactionstats there are minor compactions
> piling up (also these get into the thousands, it seems they're being
> created faster than they can be addressed)
> Can anyone suggest anywhere we might be going wrong? As I say, at the
> present we're just looking to do a bulk insert, no read activity until
> the writes have completed.
> Thanks in advance,
> Adam
> <cassandra.yaml>

View raw message