incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: pycassa fails to write values larger than one-tenth thrift_framed_transport_size_in_mb, defaults to 15MB --> 1.5MB limit on values !?
Date Tue, 14 May 2013 18:12:25 GMT
> 2013-05-10 16:09:13,250 10441:69: adding UUID('d8a31630-3b90-46d0-9e34-1b7638044c62')
with 1.500000 MB
> 2013-05-10 16:09:13,250 10441:69: adding UUID('ba14b6fc-f9ed-4d33-b64d-d705f405ad22')
with 1.500000 MB
> 2013-05-10 16:09:13,250 10441:69: adding UUID('0b08d080-ec72-4f6e-b4a1-1227b8e28dac')
with 1.500000 MB
> 2013-05-10 16:09:13,250 10441:69: adding UUID('d6a9c9fc-5775-4da7-92db-2f7cc89e6761')
with 1.500000 MB
> 2013-05-10 16:09:13,250 10441:69: adding UUID('970b13be-b61d-4ff5-b13c-dd329f307337')
with 1.500000 MB
> 2013-05-10 16:09:13,251 10441:69: adding UUID('ba608688-adc1-4b3c-b97d-64bc7c26b997')
with 1.500000 MB
> 2013-05-10 16:09:13,251 10441:69: adding UUID('c72574d5-f562-4780-8a86-b509138bf3d0')
with 1.500000 MB
> 2013-05-10 16:09:13,251 10441:69: adding UUID('6e795783-2eb3-4a1f-84a5-0554d568c461')
with 1.500000 MB
> 2013-05-10 16:09:13,251 10441:69: adding UUID('8f79775e-ef9d-4879-8b0f-4b2dd42c3f2b')
with 1.500000 MB
> 2013-05-10 16:09:13,251 10441:69: adding UUID('59f14d33-c4a6-49ce-80ce-48a3ee433cbf')
with 1.500000 MB
You are sending 10 rows with 1.5MB of column data, plus the extra data for the row key, is
over 15MB. 

It's the size of the request, not the size of individual rows. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/05/2013, at 4:39 AM, John R. Frank <jrf@mit.edu> wrote:

> C* users,
> 
> The simple code below demonstrates pycassa failing to write values containing more than
one-tenth that thrift_framed_transport_size_in_mb. It writes a single column row using a UUID
key.
> 
> For example, with the default of
> 
> 	thrift_framed_transport_size_in_mb: 15
> 
> the code below shows pycassa failing on rows of 1.5MB of data:
> 
> 2013-05-10 16:09:13,251 10441:69: adding UUID('59f14d33-c4a6-49ce-80ce-48a3ee433cbf')
with 1.500000 MB
> Connection 140020167928080 (ec2-23-22-224-180.compute-1.amazonaws.com:9160) was checked
out from pool 140020167927056
> Connection 140020167928080 (ec2-23-22-224-180.compute-1.amazonaws.com:9160) in pool 140020167927056
failed: [Errno 104] Connection reset by peer
> 2013-05-10 16:09:13,370 10441:25: connection_failed: reset pool??
> 
> Sometimes it says "TSocket read 0 bytes" and other times "Connection reset by peer".
 I have tried changing the thrift_framed_transport_size_in_mb and the 10% pattern holds. 
Possibly related to:
> 
> 	https://github.com/pycassa/pycassa/issues/168
> 
> I thought the whole point of framed transport was to split things up so they can be larger
than a single frame.  Is that wrong?  This is breaking at just 10% of the frame size.
> 
> Maybe we broke something else?
> 
> Pycassa is using framed transport -- see assert in code below.
> 
> This AWS m1.xlarge was constructed just for this test using DataStax AMI:
> 
>  http://www.datastax.com/docs/1.2/install/install_ami
> 
> http://ec2-23-22-224-180.compute-1.amazonaws.com:8888/opscenter/index.html
> 
> 
> The code and log output are attached.  The log was generating running on another m1.xlarge
running in the same Amazon data center.
> 
> Thanks for any ideas.
> -John
> 
> import uuid
> import getpass
> import logging
> logger = logging.getLogger('test')
> logger.setLevel( logging.INFO)
> 
> ch = logging.StreamHandler()
> ch.setLevel( logging.DEBUG )
> formatter = logging.Formatter('%(asctime)s %(process)d:%(lineno)d: %(message)s')
> ch.setFormatter(formatter)
> logger.addHandler(ch)
> 
> ## get the Cassandra client library
> import pycassa
> from pycassa.pool import ConnectionPool
> from pycassa.system_manager import SystemManager, SIMPLE_STRATEGY, \
>    LEXICAL_UUID_TYPE, ASCII_TYPE, BYTES_TYPE
> 
> log = pycassa.PycassaLogger()
> log.set_logger_name('pycassa_library')
> log.set_logger_level('debug')
> log.get_logger().addHandler(logging.StreamHandler())
> 
> class Listener(object):
>    def connection_failed(self, dic):
>        logger.critical('connection_failed: reset pool??')
> 
> ## this is an m1.xlarge doing nothing but supporting this test
> server = 'ec2-23-22-224-180.compute-1.amazonaws.com:9160'
> keyspace = 'testkeyspace_' + getpass.getuser().replace('-', '_')
> family = 'testcf'
> sm = SystemManager(server)
> try:
>    sm.drop_keyspace(keyspace)
> except pycassa.InvalidRequestException:
>    pass
> sm.create_keyspace(keyspace, SIMPLE_STRATEGY, {'replication_factor': '1'})
> sm.create_column_family(keyspace, family, super=False,
>                        key_validation_class = LEXICAL_UUID_TYPE,
>                        default_validation_class  = LEXICAL_UUID_TYPE,
>                        column_name_class = ASCII_TYPE)
> sm.alter_column(keyspace, family, 'test', ASCII_TYPE)
> sm.close()
> 
> pool = ConnectionPool(keyspace, [server], max_retries=10, pool_timeout=0, pool_size=10,
timeout=120)
> pool.fill()
> pool.add_listener( Listener() )
> 
> ## assert that we are using framed transport
> import thrift
> conn = pool._q.get()
> assert isinstance(conn.transport, thrift.transport.TTransport.TFramedTransport)
> pool._q.put(conn)
> 
> try:
>    for k in range(20):
>        ## write some data to cassandra using increasing data sizes
>        big_data = ' ' * 2**18 * k
>        num_rows = 10
>        keys = []
>        rows = []
>        for i in xrange(num_rows):
>            key = uuid.uuid4()
>            rows.append((key, dict(test=big_data)))
>            keys.append(key)
> 
>        testcf = pycassa.ColumnFamily(pool, family)
>        with testcf.batch() as batch:
>            for (key, data_dict) in rows:
>                data_size = len(data_dict.values()[0])
>                logger.critical('adding %r with %.6f MB' % (key, float(data_size)/2**20))
>                batch.insert(key, data_dict)
> 
>        logger.critical('%d rows written' % num_rows)
> 
> finally:
>    sm = SystemManager(server)
>    try:
>        sm.drop_keyspace(keyspace)
>    except pycassa.InvalidRequestException:
>        pass
>    sm.close()
>    logger.critical('clearing test keyspace: %r' % keyspace)<log.txt><test_pycassa_big_writes.py>


Mime
View raw message