Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 89516 invoked from network); 23 Apr 2010 14:48:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Apr 2010 14:48:02 -0000 Received: (qmail 30313 invoked by uid 500); 23 Apr 2010 14:48:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 30298 invoked by uid 500); 23 Apr 2010 14:48:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 30290 invoked by uid 99); 23 Apr 2010 14:48:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Apr 2010 14:48:01 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Apr 2010 14:47:55 +0000 Received: by wyb35 with SMTP id 35so1404051wyb.31 for ; Fri, 23 Apr 2010 07:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=Nue4sctgDER7u9JZq/TlTep9YbEdc/cxDDwxYEhwIIk=; b=U+4UO4ximJua2bmtOBKVh75Ay4Wxkp8mS2O7tzhvw+yLHDGqUWWLxdz1KL0e3LDI+o 8vIdShDu3g7hcpJiwOms+/D2Rk8CHsfHC59TujkhL5hFJwISNMK/TVcj57d2FzYIz7ag YF76qJ7kit0D795OQ/5xL3l6u4V67J8Iodkbg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=nyUIWSPvHT6rIak1tb4gNJsK4LfZpaLVPBFf5gV6/+aiznbveGf9xrfSgaTgW5PV+1 WoGdobswa2dF3djxOL+U0LI3HEW/3PVfJxhHyBVfZteBFXuMdudk4+wAfzMMcYzxzNx+ IrnJz9VWGVM9D98PW+3Cocp3wU2pM/7GA0kL4= Received: by 10.216.86.67 with SMTP id v45mr218692wee.70.1272034055184; Fri, 23 Apr 2010 07:47:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.17.147 with HTTP; Fri, 23 Apr 2010 07:47:15 -0700 (PDT) In-Reply-To: References: <20DF20F5-F894-40B3-AFB8-70D18F61E2A9@oskarsson.nu> From: Jonathan Ellis Date: Fri, 23 Apr 2010 09:47:15 -0500 Message-ID: Subject: Re: MapReduce, Timeouts and Range Batch Size To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org You could look into it, but it's not going to be an easy backport since SSTableReader and SSTableScanner got split into two classes in trunk. On Fri, Apr 23, 2010 at 9:39 AM, Joost Ouwerkerk wro= te: > Awesome. =A0In the meantime, I hacked something similar myself. =A0The > performance difference does not appear to be material. =A0I think the rea= l > killer is the get_range_slices call. =A0Relative to that, the cost of get= ting > the connection appears to be more or less trivial. =A0What can I do to > alleviate that cost? =A0CASSANDRA-821 looks interesting -- can I apply th= at to > 0.6.1 ? > joost. > On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis wrote= : >> >> Great! =A0Created https://issues.apache.org/jira/browse/CASSANDRA-1017 >> to track this. >> >> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson >> wrote: >> > I have written some code to avoid thrift reconnection, it just keeps t= he >> > connection open between get_range_slices calls. >> > I can extract that and put it up but not until early next week. >> > >> > /Johan >> > >> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote: >> > >> >> That would be an easy win, sure. >> >> >> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk >> >> wrote: >> >>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit(= ) >> >>> when >> >>> MapReducing. =A0So I've reduced the Range Batch Size to 256 (from 40= 96) >> >>> and >> >>> this seems to have fixed my problem, although it has slowed things >> >>> down a >> >>> bit -- presumably because there are 16x more calls to >> >>> get_range_slices. >> >>> While I was in that code I noticed that a new client was being creat= ed >> >>> for >> >>> each batch get. =A0By decreasing the batch size, I've increased this >> >>> overhead. =A0I'm thinking of re-writing ColumnFamilyRecordReader to = do >> >>> some >> >>> connection pooling. =A0Anyone have any thoughts on that? >> >>> joost. >> >>> >> > >> > > >