Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 45003 invoked from network); 15 Oct 2009 15:40:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Oct 2009 15:40:56 -0000 Received: (qmail 30723 invoked by uid 500); 15 Oct 2009 15:40:55 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 30708 invoked by uid 500); 15 Oct 2009 15:40:55 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 30699 invoked by uid 99); 15 Oct 2009 15:40:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 15:40:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.219.209 as permitted sender) Received: from [209.85.219.209] (HELO mail-ew0-f209.google.com) (209.85.219.209) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 15:40:47 +0000 Received: by ewy5 with SMTP id 5so602315ewy.12 for ; Thu, 15 Oct 2009 08:40:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=0f1aw7Jzql9P2/sx1VxYzRPMp17SH9nWF0CqZsbZy7A=; b=FfnRWX3GDNj8XWUgd9ODrnV3YKOUAHxtJtndbygO6ImerYayZl8jg6wxIjxPmR055Y bgh1RqfBcY2mpcdQ8aUA5LZ3y5Gwc5MiysK1MmG9u92CPWPHMrSWE8RUhBcPBjO/hjx6 t0wOM8Nbu+9OYNMRn7/MDs5QhItxygsc8gTwY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=RHX1QYV+p2FW45HLWsDyER02OeaxZyY/dAfmg0fuwSvsEiE2LqYBLf9zAw52i0ercl NRdX/8cz6beIFB29yE0Y+LLY9c6dm1d+q3QpVj3dMbhfAY7M7Qmcm8Z6mVFW3IbuG7wf hr9rE2z4LW+44T2iEj1vJLoiYOj5dMXvOKsl4= MIME-Version: 1.0 Received: by 10.216.91.15 with SMTP id g15mr63664wef.24.1255621227354; Thu, 15 Oct 2009 08:40:27 -0700 (PDT) In-Reply-To: <08976DB1-1089-402E-998F-9BC65ACCBD7C@gmail.com> References: <91882eb0910150826w16dc4e08g62ec1d279b458255@mail.gmail.com> <08976DB1-1089-402E-998F-9BC65ACCBD7C@gmail.com> From: Jonathan Ellis Date: Thu, 15 Oct 2009 10:40:07 -0500 Message-ID: Subject: Re: Thrift Perl API Timeout Issues To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org What is the default? On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani wrote: > You need to call > $socket->setRecvTimeout() > With a higher number in ms. > > > On Oct 15, 2009, at 11:26 AM, Eric Lubow wrote: > > Using the Thrift Perl API into Cassandra, I am running into what is > endearingly referred to as the 4 bytes of doom: > =A0TSocket: timed out reading 4 bytes from localhost:9160 > The script I am using is fairly simple. =A0I have a text file that has ab= out > 3.6 million lines that are formatted like: =A0foo@bar.com =A01234 > The Cassandra dataset is a single column family called Users in the Maili= ngs > keyspace with a data layout of: > Users =3D { > =A0=A0 =A0'foo@example.com': { > =A0=A0 =A0 =A0 =A0email: 'foo@example.com', > =A0=A0 =A0 =A0 =A0person_id: '123456', > =A0=A0 =A0 =A0 =A0send_dates_2009-09-30: '2245', > =A0=A0 =A0 =A0 =A0send_dates_2009-10-01: '2247', > =A0=A0 =A0}, > } > There are about 3.5 million rows in the Users column family and each row = has > no more than 4 columns (listed above). =A0Some only have 3 (one of the > send_dates_YYYY-MM-DD isn't there). > The script parses it and then connects to Cassandra and does a get_slice = and > counts the return values adding that to a hash: > =A0=A0 =A0 my ($value) =3D $client->get_slice( > =A0=A0 =A0 =A0 =A0 'Mailings', > =A0=A0 =A0 =A0 =A0 $email, > =A0=A0 =A0 =A0 =A0 Cassandra::ColumnParent->new({ > =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 column_family =3D> 'Users', > =A0=A0 =A0 =A0 =A0 =A0 =A0 }), > =A0=A0 =A0 =A0 =A0 Cassandra::SlicePredicate->new({ > =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 slice_range =3D> Cassandra::SliceRange= ->new({ > =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 start =3D> 'send_dates= _2009-09-29', > =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 finish =3D> 'send_date= s_2009-10-30', > =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }), > =A0=A0 =A0 =A0 =A0 =A0 =A0 }), > =A0=A0 =A0 =A0 =A0 Cassandra::ConsistencyLevel::ONE > =A0=A0 =A0 ); > =A0=A0 =A0 $counter{($#{$value} + 1)}++; > For the most part, this script times out after 1 minute or so. Replacing = the > get_slice with a get_count, I can get it to about 2 million queries befor= e I > get the timeout. =A0Replacing the get_slice with a get, I make it to abou= t 2.5 > million before I get the timeout. =A0The only way I could get it to run a= ll > the way through was to add a 1/100 of a second sleep during every iterati= on. > =A0I was able to get the script to complete when I shut down everything e= lse > on the machine (and it took 177m to complete). =A0But since this is a > semi-production machine, I had to turn everything back on afterwards. > So for poops and laughs (at the recommendation of jbellis), I rewrote the > script in Python and it has since run (using get_slice) 3 times fully > without timing out (approximately 130m in Python) with everything else > running on the machine. > My question is, having seen this same thing in the PHP API and it is my > understanding that the Perl API was based on the PHP API, > could=A0http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl her= e > too? =A0Is anyone else seeing this issue? =A0If so, have you gotten aroun= d it? > Thanks. > -e