incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John R. Frank" <...@mit.edu>
Subject pycassa fails to write values larger than one-tenth thrift_framed_transport_size_in_mb, defaults to 15MB --> 1.5MB limit on values !?
Date Fri, 10 May 2013 16:39:35 GMT
C* users,

The simple code below demonstrates pycassa failing to write values 
containing more than one-tenth that thrift_framed_transport_size_in_mb. 
It writes a single column row using a UUID key.

For example, with the default of

 	thrift_framed_transport_size_in_mb: 15

the code below shows pycassa failing on rows of 1.5MB of data:

2013-05-10 16:09:13,251 10441:69: adding UUID('59f14d33-c4a6-49ce-80ce-48a3ee433cbf') with
1.500000 MB
Connection 140020167928080 (ec2-23-22-224-180.compute-1.amazonaws.com:9160) was checked out
from pool 140020167927056
Connection 140020167928080 (ec2-23-22-224-180.compute-1.amazonaws.com:9160) in pool 140020167927056
failed: [Errno 104] Connection reset by peer
2013-05-10 16:09:13,370 10441:25: connection_failed: reset pool??

Sometimes it says "TSocket read 0 bytes" and other times "Connection reset 
by peer".  I have tried changing the thrift_framed_transport_size_in_mb 
and the 10% pattern holds.  Possibly related to:

 	https://github.com/pycassa/pycassa/issues/168

I thought the whole point of framed transport was to split things up so 
they can be larger than a single frame.  Is that wrong?  This is breaking 
at just 10% of the frame size.

Maybe we broke something else?

Pycassa is using framed transport -- see assert in code below.

This AWS m1.xlarge was constructed just for this test using DataStax AMI:

   http://www.datastax.com/docs/1.2/install/install_ami

http://ec2-23-22-224-180.compute-1.amazonaws.com:8888/opscenter/index.html


The code and log output are attached.  The log was generating running on 
another m1.xlarge running in the same Amazon data center.

Thanks for any ideas.
-John

import uuid
import getpass
import logging
logger = logging.getLogger('test')
logger.setLevel( logging.INFO)

ch = logging.StreamHandler()
ch.setLevel( logging.DEBUG )
formatter = logging.Formatter('%(asctime)s %(process)d:%(lineno)d: %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

## get the Cassandra client library
import pycassa
from pycassa.pool import ConnectionPool
from pycassa.system_manager import SystemManager, SIMPLE_STRATEGY, \
     LEXICAL_UUID_TYPE, ASCII_TYPE, BYTES_TYPE

log = pycassa.PycassaLogger()
log.set_logger_name('pycassa_library')
log.set_logger_level('debug')
log.get_logger().addHandler(logging.StreamHandler())

class Listener(object):
     def connection_failed(self, dic):
         logger.critical('connection_failed: reset pool??')

## this is an m1.xlarge doing nothing but supporting this test
server = 'ec2-23-22-224-180.compute-1.amazonaws.com:9160'
keyspace = 'testkeyspace_' + getpass.getuser().replace('-', '_')
family = 'testcf'
sm = SystemManager(server)
try:
     sm.drop_keyspace(keyspace)
except pycassa.InvalidRequestException:
     pass
sm.create_keyspace(keyspace, SIMPLE_STRATEGY, {'replication_factor': '1'})
sm.create_column_family(keyspace, family, super=False,
                         key_validation_class = LEXICAL_UUID_TYPE,
                         default_validation_class  = LEXICAL_UUID_TYPE,
                         column_name_class = ASCII_TYPE)
sm.alter_column(keyspace, family, 'test', ASCII_TYPE)
sm.close()

pool = ConnectionPool(keyspace, [server], max_retries=10, pool_timeout=0, pool_size=10, timeout=120)
pool.fill()
pool.add_listener( Listener() )

## assert that we are using framed transport
import thrift
conn = pool._q.get()
assert isinstance(conn.transport, thrift.transport.TTransport.TFramedTransport)
pool._q.put(conn)

try:
     for k in range(20):
         ## write some data to cassandra using increasing data sizes
         big_data = ' ' * 2**18 * k
         num_rows = 10
         keys = []
         rows = []
         for i in xrange(num_rows):
             key = uuid.uuid4()
             rows.append((key, dict(test=big_data)))
             keys.append(key)

         testcf = pycassa.ColumnFamily(pool, family)
         with testcf.batch() as batch:
             for (key, data_dict) in rows:
                 data_size = len(data_dict.values()[0])
                 logger.critical('adding %r with %.6f MB' % (key, float(data_size)/2**20))
                 batch.insert(key, data_dict)

         logger.critical('%d rows written' % num_rows)

finally:
     sm = SystemManager(server)
     try:
         sm.drop_keyspace(keyspace)
     except pycassa.InvalidRequestException:
         pass
     sm.close()
     logger.critical('clearing test keyspace: %r' % keyspace)
Mime
View raw message