cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: MapReduce, Timeouts and Range Batch Size
Date Fri, 23 Apr 2010 13:39:19 GMT
Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
to track this.

On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson <johan@oskarsson.nu> wrote:
> I have written some code to avoid thrift reconnection, it just keeps the connection open
between get_range_slices calls.
> I can extract that and put it up but not until early next week.
>
> /Johan
>
> On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
>
>> That would be an easy win, sure.
>>
>> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
>>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit() when
>>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096) and
>>> this seems to have fixed my problem, although it has slowed things down a
>>> bit -- presumably because there are 16x more calls to get_range_slices.
>>> While I was in that code I noticed that a new client was being created for
>>> each batch get.  By decreasing the batch size, I've increased this
>>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do some
>>> connection pooling.  Anyone have any thoughts on that?
>>> joost.
>>>
>
>

Mime
View raw message