cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: MapReduce, Timeouts and Range Batch Size
Date Fri, 23 Apr 2010 14:47:15 GMT
You could look into it, but it's not going to be an easy backport
since SSTableReader and SSTableScanner got split into two classes in
trunk.

On Fri, Apr 23, 2010 at 9:39 AM, Joost Ouwerkerk <joost@openplaces.org> wrote:
> Awesome.  In the meantime, I hacked something similar myself.  The
> performance difference does not appear to be material.  I think the real
> killer is the get_range_slices call.  Relative to that, the cost of getting
> the connection appears to be more or less trivial.  What can I do to
> alleviate that cost?  CASSANDRA-821 looks interesting -- can I apply that to
> 0.6.1 ?
> joost.
> On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
>> to track this.
>>
>> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson <johan@oskarsson.nu>
>> wrote:
>> > I have written some code to avoid thrift reconnection, it just keeps the
>> > connection open between get_range_slices calls.
>> > I can extract that and put it up but not until early next week.
>> >
>> > /Johan
>> >
>> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
>> >
>> >> That would be an easy win, sure.
>> >>
>> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk <joost@openplaces.org>
>> >> wrote:
>> >>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit()
>> >>> when
>> >>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096)
>> >>> and
>> >>> this seems to have fixed my problem, although it has slowed things
>> >>> down a
>> >>> bit -- presumably because there are 16x more calls to
>> >>> get_range_slices.
>> >>> While I was in that code I noticed that a new client was being created
>> >>> for
>> >>> each batch get.  By decreasing the batch size, I've increased this
>> >>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do
>> >>> some
>> >>> connection pooling.  Anyone have any thoughts on that?
>> >>> joost.
>> >>>
>> >
>> >
>
>

Mime
View raw message