incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filippo Diotalevi <>
Subject How Cassandra determines the splits
Date Tue, 01 May 2012 15:58:14 GMT
I'm having problems in my Cassandra/Hadoop (1.0.8 + cdh3u3) cluster related to how cassandra
splits the data to be processed by Hadoop.

I'm currently testing a map reduce job, starting from a CF of roughly 1500 rows, with 

cassandra.input.split.size 10
cassandra.range.batch.size 1

but what I consistently see is that, while most of the task have 1-20 rows assigned each,
one of them is assigned 400+ rows, which gives me all sort of problems in terms of timeouts
and memory consumption (not to mention seeing the mapper progress bar going to 4000% and more).

Do you have any suggestion to solve/troublehsoot this issue?

Filippo Diotalevi

View raw message