incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: Thrift Perl API Timeout Issues
Date Thu, 15 Oct 2009 15:42:39 GMT
I think it's 100ms. I need to increase it to match python I guess.

Sent from my iPhone

On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> What is the default?
>
> On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <jakers@gmail.com>  
> wrote:
>> You need to call
>> $socket->setRecvTimeout()
>> With a higher number in ms.
>>
>>
>> On Oct 15, 2009, at 11:26 AM, Eric Lubow <eric.lubow@gmail.com>  
>> wrote:
>>
>> Using the Thrift Perl API into Cassandra, I am running into what is
>> endearingly referred to as the 4 bytes of doom:
>>  TSocket: timed out reading 4 bytes from localhost:9160
>> The script I am using is fairly simple.  I have a text file that  
>> has about
>> 3.6 million lines that are formatted like:  foo@bar.com  1234
>> The Cassandra dataset is a single column family called Users in the  
>> Mailings
>> keyspace with a data layout of:
>> Users = {
>>     'foo@example.com': {
>>         email: 'foo@example.com',
>>         person_id: '123456',
>>         send_dates_2009-09-30: '2245',
>>         send_dates_2009-10-01: '2247',
>>     },
>> }
>> There are about 3.5 million rows in the Users column family and  
>> each row has
>> no more than 4 columns (listed above).  Some only have 3 (one of the
>> send_dates_YYYY-MM-DD isn't there).
>> The script parses it and then connects to Cassandra and does a  
>> get_slice and
>> counts the return values adding that to a hash:
>>      my ($value) = $client->get_slice(
>>          'Mailings',
>>          $email,
>>          Cassandra::ColumnParent->new({
>>                  column_family => 'Users',
>>              }),
>>          Cassandra::SlicePredicate->new({
>>                  slice_range => Cassandra::SliceRange->new({
>>                          start => 'send_dates_2009-09-29',
>>                          finish => 'send_dates_2009-10-30',
>>                      }),
>>              }),
>>          Cassandra::ConsistencyLevel::ONE
>>      );
>>      $counter{($#{$value} + 1)}++;
>> For the most part, this script times out after 1 minute or so.  
>> Replacing the
>> get_slice with a get_count, I can get it to about 2 million queries  
>> before I
>> get the timeout.  Replacing the get_slice with a get, I make it to  
>> about 2.5
>> million before I get the timeout.  The only way I could get it to  
>> run all
>> the way through was to add a 1/100 of a second sleep during every  
>> iteration.
>>  I was able to get the script to complete when I shut down  
>> everything else
>> on the machine (and it took 177m to complete).  But since this is a
>> semi-production machine, I had to turn everything back on afterwards.
>> So for poops and laughs (at the recommendation of jbellis), I  
>> rewrote the
>> script in Python and it has since run (using get_slice) 3 times fully
>> without timing out (approximately 130m in Python) with everything  
>> else
>> running on the machine.
>> My question is, having seen this same thing in the PHP API and it  
>> is my
>> understanding that the Perl API was based on the PHP API,
>> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl  
>> here
>> too?  Is anyone else seeing this issue?  If so, have you gotten  
>> around it?
>> Thanks.
>> -e

Mime
View raw message