Using the Thrift Perl API into Cassandra, I am running into what is endearingly referred to as the 4 bytes of doom:

 TSocket: timed out reading 4 bytes from localhost:9160

The script I am using is fairly simple.  I have a text file that has about 3.6 million lines that are formatted like:  foo@bar.com  1234

The Cassandra dataset is a single column family called Users in the Mailings keyspace with a data layout of:
Users = { 
    'foo@example.com': {
        email: 'foo@example.com',
        person_id: '123456',
        send_dates_2009-09-30: '2245',
        send_dates_2009-10-01: '2247',
    },
}
There are about 3.5 million rows in the Users column family and each row has no more than 4 columns (listed above).  Some only have 3 (one of the send_dates_YYYY-MM-DD isn't there).

The script parses it and then connects to Cassandra and does a get_slice and counts the return values adding that to a hash:
     my ($value) = $client->get_slice(
         'Mailings',
         $email,
         Cassandra::ColumnParent->new({
                 column_family => 'Users',
             }),
         Cassandra::SlicePredicate->new({
                 slice_range => Cassandra::SliceRange->new({
                         start => 'send_dates_2009-09-29',
                         finish => 'send_dates_2009-10-30',
                     }),
             }),
         Cassandra::ConsistencyLevel::ONE
     );
     $counter{($#{$value} + 1)}++;

For the most part, this script times out after 1 minute or so. Replacing the get_slice with a get_count, I can get it to about 2 million queries before I get the timeout.  Replacing the get_slice with a get, I make it to about 2.5 million before I get the timeout.  The only way I could get it to run all the way through was to add a 1/100 of a second sleep during every iteration.  I was able to get the script to complete when I shut down everything else on the machine (and it took 177m to complete).  But since this is a semi-production machine, I had to turn everything back on afterwards.

So for poops and laughs (at the recommendation of jbellis), I rewrote the script in Python and it has since run (using get_slice) 3 times fully without timing out (approximately 130m in Python) with everything else running on the machine.

My question is, having seen this same thing in the PHP API and it is my understanding that the Perl API was based on the PHP API, could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here too?  Is anyone else seeing this issue?  If so, have you gotten around it?

Thanks.

-e