From cassandra-user-return-927-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Thu Oct 15 16:20:13 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 57589 invoked from network); 15 Oct 2009 16:20:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Oct 2009 16:20:13 -0000 Received: (qmail 15870 invoked by uid 500); 15 Oct 2009 16:20:12 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 15857 invoked by uid 500); 15 Oct 2009 16:20:12 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 15848 invoked by uid 99); 15 Oct 2009 16:20:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 16:20:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jakers@gmail.com designates 209.85.211.182 as permitted sender) Received: from [209.85.211.182] (HELO mail-yw0-f182.google.com) (209.85.211.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 16:20:02 +0000 Received: by ywh12 with SMTP id 12so1146268ywh.21 for ; Thu, 15 Oct 2009 09:19:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:x-mailer :mime-version:subject:date:references; bh=QxWOad/PoN6tdx1cxzLUlhEOpBZbdBqWtRSB7u1qtZo=; b=aoHCeGuEY6eOUAbAkg0QCHU/z0dTNAvDxmyTTSya0Y9fezRFqEcfVGXY/ZZb94Op4+ bQk9NDjwJzJ8kqkFs3N4tIowxL0pXR57V9wWGNG5XPnGUvhB6V+uLJIPRHXnULGM13Bj 6E1OmmjOpIp67iRBhaJa/q0YjeV1U3MTLYndk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:x-mailer:mime-version:subject:date :references; b=un4Y1gof104EPEakAFyMlfuiKzqJDDMTfVsdMIoeCEoQV1Xb6CvVVkouO+u8Yr+GN+ I9EbG/z66lcEBm4QVg9Es6rTKURt58KFNyPiVJwVrPYXHuLKuwdE9CVchhlRheHbxXQ3 bP6PX8Vpnd5RvdcZRP5Le09cMP6F/VO7ayskk= Received: by 10.103.37.25 with SMTP id p25mr120908muj.42.1255623580794; Thu, 15 Oct 2009 09:19:40 -0700 (PDT) Received: from ?10.137.174.204? (mobile-166-137-134-146.mycingular.net [166.137.134.146]) by mx.google.com with ESMTPS id 7sm288514mup.42.2009.10.15.09.19.36 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 15 Oct 2009 09:19:39 -0700 (PDT) Message-Id: From: Jake Luciani To: "cassandra-user@incubator.apache.org" In-Reply-To: <91882eb0910150848k337b986dyca335d22405abb00@mail.gmail.com> Content-Type: multipart/alternative; boundary=Apple-Mail-1-290964399 Content-Transfer-Encoding: 7bit X-Mailer: iPhone Mail (7C144) Mime-Version: 1.0 (iPhone Mail 7C144) Subject: Re: Thrift Perl API Timeout Issues Date: Thu, 15 Oct 2009 12:19:14 -0400 References: <91882eb0910150826w16dc4e08g62ec1d279b458255@mail.gmail.com> <08976DB1-1089-402E-998F-9BC65ACCBD7C@gmail.com> <91882eb0910150848k337b986dyca335d22405abb00@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-1-290964399 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit What happens if you set it to 100000? On Oct 15, 2009, at 11:48 AM, Eric Lubow wrote: > My connection section of the script is here: > # Connect to the database > my $socket = new Thrift::Socket('localhost',9160); > $socket->setSendTimeout(2500); > $socket->setRecvTimeout(7500); > my $transport = new Thrift::BufferedTransport($socket,2048,2048); > my $protocol = new Thrift::BinaryProtocol($transport); > my $client = Cassandra::CassandraClient->new($protocol); > > I even tried it with combinations of 1024 as the size and 1000 as > the SendTimeout and 5000 as the RecvTimeout. > > -e > > On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani > wrote: > I think it's 100ms. I need to increase it to match python I guess. > > Sent from my iPhone > > > On Oct 15, 2009, at 11:40 AM, Jonathan Ellis > wrote: > > What is the default? > > On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani > wrote: > You need to call > $socket->setRecvTimeout() > With a higher number in ms. > > > On Oct 15, 2009, at 11:26 AM, Eric Lubow wrote: > > Using the Thrift Perl API into Cassandra, I am running into what is > endearingly referred to as the 4 bytes of doom: > TSocket: timed out reading 4 bytes from localhost:9160 > The script I am using is fairly simple. I have a text file that has > about > 3.6 million lines that are formatted like: foo@bar.com 1234 > The Cassandra dataset is a single column family called Users in the > Mailings > keyspace with a data layout of: > Users = { > 'foo@example.com': { > email: 'foo@example.com', > person_id: '123456', > send_dates_2009-09-30: '2245', > send_dates_2009-10-01: '2247', > }, > } > There are about 3.5 million rows in the Users column family and each > row has > no more than 4 columns (listed above). Some only have 3 (one of the > send_dates_YYYY-MM-DD isn't there). > The script parses it and then connects to Cassandra and does a > get_slice and > counts the return values adding that to a hash: > my ($value) = $client->get_slice( > 'Mailings', > $email, > Cassandra::ColumnParent->new({ > column_family => 'Users', > }), > Cassandra::SlicePredicate->new({ > slice_range => Cassandra::SliceRange->new({ > start => 'send_dates_2009-09-29', > finish => 'send_dates_2009-10-30', > }), > }), > Cassandra::ConsistencyLevel::ONE > ); > $counter{($#{$value} + 1)}++; > For the most part, this script times out after 1 minute or so. > Replacing the > get_slice with a get_count, I can get it to about 2 million queries > before I > get the timeout. Replacing the get_slice with a get, I make it to > about 2.5 > million before I get the timeout. The only way I could get it to > run all > the way through was to add a 1/100 of a second sleep during every > iteration. > I was able to get the script to complete when I shut down > everything else > on the machine (and it took 177m to complete). But since this is a > semi-production machine, I had to turn everything back on afterwards. > So for poops and laughs (at the recommendation of jbellis), I > rewrote the > script in Python and it has since run (using get_slice) 3 times fully > without timing out (approximately 130m in Python) with everything else > running on the machine. > My question is, having seen this same thing in the PHP API and it is > my > understanding that the Perl API was based on the PHP API, > could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl > here > too? Is anyone else seeing this issue? If so, have you gotten > around it? > Thanks. > -e > --Apple-Mail-1-290964399 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
What happens if you set it to 100000?



On Oct 15, 2009, at 11:48 AM, Eric Lubow <eric.lubow@gmail.com> wrote:

My connection section of the script is here:
 # Connect to the database
 my $socket = new Thrift::Socket('localhost',9160);
    $socket->setSendTimeout(2500);
    $socket->setRecvTimeout(7500);
 my $transport = new Thrift::BufferedTransport($socket,2048,2048);
 my $protocol = new Thrift::BinaryProtocol($transport);
 my $client = Cassandra::CassandraClient->new($protocol);

I even tried it with combinations of 1024 as the size and 1000 as the SendTimeout and 5000 as the RecvTimeout.

-e

On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani <jakers@gmail.com> wrote:
I think it's 100ms. I need to increase it to match python I guess.

Sent from my iPhone


On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

What is the default?

On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <jakers@gmail.com> wrote:
You need to call
$socket->setRecvTimeout()
With a higher number in ms.


On Oct 15, 2009, at 11:26 AM, Eric Lubow <eric.lubow@gmail.com> wrote:

Using the Thrift Perl API into Cassandra, I am running into what is
endearingly referred to as the 4 bytes of doom:
 TSocket: timed out reading 4 bytes from localhost:9160
The script I am using is fairly simple.  I have a text file that has about
3.6 million lines that are formatted like:  foo@bar.com  1234
The Cassandra dataset is a single column family called Users in the Mailings
keyspace with a data layout of:
Users = {
   'foo@example.com': {
       email: 'foo@example.com',
       person_id: '123456',
       send_dates_2009-09-30: '2245',
       send_dates_2009-10-01: '2247',
   },
}
There are about 3.5 million rows in the Users column family and each row has
no more than 4 columns (listed above).  Some only have 3 (one of the
send_dates_YYYY-MM-DD isn't there).
The script parses it and then connects to Cassandra and does a get_slice and
counts the return values adding that to a hash:
    my ($value) = $client->get_slice(
        'Mailings',
        $email,
        Cassandra::ColumnParent->new({
                column_family => 'Users',
            }),
        Cassandra::SlicePredicate->new({
                slice_range => Cassandra::SliceRange->new({
                        start => 'send_dates_2009-09-29',
                        finish => 'send_dates_2009-10-30',
                    }),
            }),
        Cassandra::ConsistencyLevel::ONE
    );
    $counter{($#{$value} + 1)}++;
For the most part, this script times out after 1 minute or so. Replacing the
get_slice with a get_count, I can get it to about 2 million queries before I
get the timeout.  Replacing the get_slice with a get, I make it to about 2.5
million before I get the timeout.  The only way I could get it to run all
the way through was to add a 1/100 of a second sleep during every iteration.
 I was able to get the script to complete when I shut down everything else
on the machine (and it took 177m to complete).  But since this is a
semi-production machine, I had to turn everything back on afterwards.
So for poops and laughs (at the recommendation of jbellis), I rewrote the
script in Python and it has since run (using get_slice) 3 times fully
without timing out (approximately 130m in Python) with everything else
running on the machine.
My question is, having seen this same thing in the PHP API and it is my
understanding that the Perl API was based on the PHP API,
could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here
too?  Is anyone else seeing this issue?  If so, have you gotten around it?
Thanks.
-e

--Apple-Mail-1-290964399--