incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Thrift Perl API Timeout Issues
Date Thu, 15 Oct 2009 18:38:08 GMT
I see a similar thing happening all the time.  I get around it by closing
the current connection and reconnecting after a sleep.  Although I am able
to do quite a few inserts between errors, so I'm not sure if it's the
exact problem.

-Anthony

On Thu, Oct 15, 2009 at 11:26:08AM -0400, Eric Lubow wrote:
> Using the Thrift Perl API into Cassandra, I am running into what is
> endearingly referred to as the 4 bytes of doom:
>  TSocket: timed out reading 4 bytes from localhost:9160
> 
> The script I am using is fairly simple.  I have a text file that has about
> 3.6 million lines that are formatted like:  foo@bar.com  1234
> 
> The Cassandra dataset is a single column family called Users in the Mailings
> keyspace with a data layout of:
> Users = {
>     'foo@example.com': {
>         email: 'foo@example.com',
>         person_id: '123456',
>         send_dates_2009-09-30: '2245',
>         send_dates_2009-10-01: '2247',
>     },
> }
> There are about 3.5 million rows in the Users column family and each row has
> no more than 4 columns (listed above).  Some only have 3 (one of the
> send_dates_YYYY-MM-DD isn't there).
> 
> The script parses it and then connects to Cassandra and does a get_slice and
> counts the return values adding that to a hash:
>      my ($value) = $client->get_slice(
>          'Mailings',
>          $email,
>          Cassandra::ColumnParent->new({
>                  column_family => 'Users',
>              }),
>          Cassandra::SlicePredicate->new({
>                  slice_range => Cassandra::SliceRange->new({
>                          start => 'send_dates_2009-09-29',
>                          finish => 'send_dates_2009-10-30',
>                      }),
>              }),
>          Cassandra::ConsistencyLevel::ONE
>      );
>      $counter{($#{$value} + 1)}++;
> 
> For the most part, this script times out after 1 minute or so. Replacing the
> get_slice with a get_count, I can get it to about 2 million queries before I
> get the timeout.  Replacing the get_slice with a get, I make it to about 2.5
> million before I get the timeout.  The only way I could get it to run all
> the way through was to add a 1/100 of a second sleep during every iteration.
>  I was able to get the script to complete when I shut down everything else
> on the machine (and it took 177m to complete).  But since this is a
> semi-production machine, I had to turn everything back on afterwards.
> 
> So for poops and laughs (at the recommendation of jbellis), I rewrote the
> script in Python and it has since run (using get_slice) 3 times fully
> without timing out (approximately 130m in Python) with everything else
> running on the machine.
> 
> My question is, having seen this same thing in the PHP API and it is my
> understanding that the Perl API was based on the PHP API, could
> http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here too?  Is
> anyone else seeing this issue?  If so, have you gotten around it?
> 
> Thanks.
> 
> -e

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message