incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joost Ouwerkerk <>
Subject MapReduce, Timeouts and Range Batch Size
Date Fri, 23 Apr 2010 02:27:52 GMT
I was getting client timeouts in ColumnFamilyRecordReader.maybeInit() when
MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096) and
this seems to have fixed my problem, although it has slowed things down a
bit -- presumably because there are 16x more calls to get_range_slices.
While I was in that code I noticed that a new client was being created for
each batch get.  By decreasing the batch size, I've increased this
overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do some
connection pooling.  Anyone have any thoughts on that?

View raw message