incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joost Ouwerkerk <jo...@openplaces.org>
Subject Re: MapReduce, Timeouts and Range Batch Size
Date Fri, 23 Apr 2010 14:39:45 GMT
Awesome.  In the meantime, I hacked something similar myself.  The
performance difference does not appear to be material.  I think the real
killer is the get_range_slices call.  Relative to that, the cost of getting
the connection appears to be more or less trivial.  What can I do to
alleviate that cost?  CASSANDRA-821 looks interesting -- can I apply that to
0.6.1 ?
joost.

On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
> to track this.
>
> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson <johan@oskarsson.nu>
> wrote:
> > I have written some code to avoid thrift reconnection, it just keeps the
> connection open between get_range_slices calls.
> > I can extract that and put it up but not until early next week.
> >
> > /Johan
> >
> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
> >
> >> That would be an easy win, sure.
> >>
> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk <joost@openplaces.org>
> wrote:
> >>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit()
> when
> >>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096)
> and
> >>> this seems to have fixed my problem, although it has slowed things down
> a
> >>> bit -- presumably because there are 16x more calls to get_range_slices.
> >>> While I was in that code I noticed that a new client was being created
> for
> >>> each batch get.  By decreasing the batch size, I've increased this
> >>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do
> some
> >>> connection pooling.  Anyone have any thoughts on that?
> >>> joost.
> >>>
> >
> >
>

Mime
View raw message