incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: Thrift Perl API Timeout Issues
Date Thu, 15 Oct 2009 15:37:57 GMT
You need to call
$socket->setRecvTimeout()

With a higher number in ms.


On Oct 15, 2009, at 11:26 AM, Eric Lubow <eric.lubow@gmail.com> wrote:

> Using the Thrift Perl API into Cassandra, I am running into what is  
> endearingly referred to as the 4 bytes of doom:
>
>  TSocket: timed out reading 4 bytes from localhost:9160
>
> The script I am using is fairly simple.  I have a text file that has  
> about 3.6 million lines that are formatted like:  foo@bar.com  1234
>
> The Cassandra dataset is a single column family called Users in the  
> Mailings keyspace with a data layout of:
> Users = {
>     'foo@example.com': {
>         email: 'foo@example.com',
>         person_id: '123456',
>         send_dates_2009-09-30: '2245',
>         send_dates_2009-10-01: '2247',
>     },
> }
> There are about 3.5 million rows in the Users column family and each  
> row has no more than 4 columns (listed above).  Some only have 3  
> (one of the send_dates_YYYY-MM-DD isn't there).
>
> The script parses it and then connects to Cassandra and does a  
> get_slice and counts the return values adding that to a hash:
>      my ($value) = $client->get_slice(
>          'Mailings',
>          $email,
>          Cassandra::ColumnParent->new({
>                  column_family => 'Users',
>              }),
>          Cassandra::SlicePredicate->new({
>                  slice_range => Cassandra::SliceRange->new({
>                          start => 'send_dates_2009-09-29',
>                          finish => 'send_dates_2009-10-30',
>                      }),
>              }),
>          Cassandra::ConsistencyLevel::ONE
>      );
>      $counter{($#{$value} + 1)}++;
>
> For the most part, this script times out after 1 minute or so.  
> Replacing the get_slice with a get_count, I can get it to about 2  
> million queries before I get the timeout.  Replacing the get_slice  
> with a get, I make it to about 2.5 million before I get the  
> timeout.  The only way I could get it to run all the way through was  
> to add a 1/100 of a second sleep during every iteration.  I was able  
> to get the script to complete when I shut down everything else on  
> the machine (and it took 177m to complete).  But since this is a  
> semi-production machine, I had to turn everything back on afterwards.
>
> So for poops and laughs (at the recommendation of jbellis), I  
> rewrote the script in Python and it has since run (using get_slice)  
> 3 times fully without timing out (approximately 130m in Python) with  
> everything else running on the machine.
>
> My question is, having seen this same thing in the PHP API and it is  
> my understanding that the Perl API was based on the PHP API, could http://issues.apache.org/jira/browse/THRIFT-347

>  apply to Perl here too?  Is anyone else seeing this issue?  If so,  
> have you gotten around it?
>
> Thanks.
>
> -e

Mime
View raw message