cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Thrift Perl API Timeout Issues
Date Thu, 15 Oct 2009 15:40:07 GMT
What is the default?

On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <jakers@gmail.com> wrote:
> You need to call
> $socket->setRecvTimeout()
> With a higher number in ms.
>
>
> On Oct 15, 2009, at 11:26 AM, Eric Lubow <eric.lubow@gmail.com> wrote:
>
> Using the Thrift Perl API into Cassandra, I am running into what is
> endearingly referred to as the 4 bytes of doom:
>  TSocket: timed out reading 4 bytes from localhost:9160
> The script I am using is fairly simple.  I have a text file that has about
> 3.6 million lines that are formatted like:  foo@bar.com  1234
> The Cassandra dataset is a single column family called Users in the Mailings
> keyspace with a data layout of:
> Users = {
>     'foo@example.com': {
>         email: 'foo@example.com',
>         person_id: '123456',
>         send_dates_2009-09-30: '2245',
>         send_dates_2009-10-01: '2247',
>     },
> }
> There are about 3.5 million rows in the Users column family and each row has
> no more than 4 columns (listed above).  Some only have 3 (one of the
> send_dates_YYYY-MM-DD isn't there).
> The script parses it and then connects to Cassandra and does a get_slice and
> counts the return values adding that to a hash:
>      my ($value) = $client->get_slice(
>          'Mailings',
>          $email,
>          Cassandra::ColumnParent->new({
>                  column_family => 'Users',
>              }),
>          Cassandra::SlicePredicate->new({
>                  slice_range => Cassandra::SliceRange->new({
>                          start => 'send_dates_2009-09-29',
>                          finish => 'send_dates_2009-10-30',
>                      }),
>              }),
>          Cassandra::ConsistencyLevel::ONE
>      );
>      $counter{($#{$value} + 1)}++;
> For the most part, this script times out after 1 minute or so. Replacing the
> get_slice with a get_count, I can get it to about 2 million queries before I
> get the timeout.  Replacing the get_slice with a get, I make it to about 2.5
> million before I get the timeout.  The only way I could get it to run all
> the way through was to add a 1/100 of a second sleep during every iteration.
>  I was able to get the script to complete when I shut down everything else
> on the machine (and it took 177m to complete).  But since this is a
> semi-production machine, I had to turn everything back on afterwards.
> So for poops and laughs (at the recommendation of jbellis), I rewrote the
> script in Python and it has since run (using get_slice) 3 times fully
> without timing out (approximately 130m in Python) with everything else
> running on the machine.
> My question is, having seen this same thing in the PHP API and it is my
> understanding that the Perl API was based on the PHP API,
> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here
> too?  Is anyone else seeing this issue?  If so, have you gotten around it?
> Thanks.
> -e

Mime
View raw message