cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Lubow <eric.lu...@gmail.com>
Subject Re: Thrift Perl API Timeout Issues
Date Thu, 15 Oct 2009 15:48:16 GMT
My connection section of the script is here: # Connect to the database
 my $socket = new Thrift::Socket('localhost',9160);
    $socket->setSendTimeout(2500);
    $socket->setRecvTimeout(7500);
 my $transport = new Thrift::BufferedTransport($socket,2048,2048);
 my $protocol = new Thrift::BinaryProtocol($transport);
 my $client = Cassandra::CassandraClient->new($protocol);

I even tried it with combinations of 1024 as the size and 1000 as the
SendTimeout and 5000 as the RecvTimeout.

-e

On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani <jakers@gmail.com> wrote:

> I think it's 100ms. I need to increase it to match python I guess.
>
> Sent from my iPhone
>
>
> On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
>  What is the default?
>>
>> On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <jakers@gmail.com> wrote:
>>
>>> You need to call
>>> $socket->setRecvTimeout()
>>> With a higher number in ms.
>>>
>>>
>>> On Oct 15, 2009, at 11:26 AM, Eric Lubow <eric.lubow@gmail.com> wrote:
>>>
>>> Using the Thrift Perl API into Cassandra, I am running into what is
>>> endearingly referred to as the 4 bytes of doom:
>>>  TSocket: timed out reading 4 bytes from localhost:9160
>>> The script I am using is fairly simple.  I have a text file that has
>>> about
>>> 3.6 million lines that are formatted like:  foo@bar.com  1234
>>> The Cassandra dataset is a single column family called Users in the
>>> Mailings
>>> keyspace with a data layout of:
>>> Users = {
>>>    'foo@example.com': {
>>>        email: 'foo@example.com',
>>>        person_id: '123456',
>>>        send_dates_2009-09-30: '2245',
>>>        send_dates_2009-10-01: '2247',
>>>    },
>>> }
>>> There are about 3.5 million rows in the Users column family and each row
>>> has
>>> no more than 4 columns (listed above).  Some only have 3 (one of the
>>> send_dates_YYYY-MM-DD isn't there).
>>> The script parses it and then connects to Cassandra and does a get_slice
>>> and
>>> counts the return values adding that to a hash:
>>>     my ($value) = $client->get_slice(
>>>         'Mailings',
>>>         $email,
>>>         Cassandra::ColumnParent->new({
>>>                 column_family => 'Users',
>>>             }),
>>>         Cassandra::SlicePredicate->new({
>>>                 slice_range => Cassandra::SliceRange->new({
>>>                         start => 'send_dates_2009-09-29',
>>>                         finish => 'send_dates_2009-10-30',
>>>                     }),
>>>             }),
>>>         Cassandra::ConsistencyLevel::ONE
>>>     );
>>>     $counter{($#{$value} + 1)}++;
>>> For the most part, this script times out after 1 minute or so. Replacing
>>> the
>>> get_slice with a get_count, I can get it to about 2 million queries
>>> before I
>>> get the timeout.  Replacing the get_slice with a get, I make it to about
>>> 2.5
>>> million before I get the timeout.  The only way I could get it to run all
>>> the way through was to add a 1/100 of a second sleep during every
>>> iteration.
>>>  I was able to get the script to complete when I shut down everything
>>> else
>>> on the machine (and it took 177m to complete).  But since this is a
>>> semi-production machine, I had to turn everything back on afterwards.
>>> So for poops and laughs (at the recommendation of jbellis), I rewrote the
>>> script in Python and it has since run (using get_slice) 3 times fully
>>> without timing out (approximately 130m in Python) with everything else
>>> running on the machine.
>>> My question is, having seen this same thing in the PHP API and it is my
>>> understanding that the Perl API was based on the PHP API,
>>> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here
>>> too?  Is anyone else seeing this issue?  If so, have you gotten around
>>> it?
>>> Thanks.
>>> -e
>>>
>>

Mime
View raw message