cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Multiple inserts cause consistency failures
Date Tue, 09 Nov 2010 20:20:40 GMT
time.time() returns the number of seconds since epoch, with fractions. The definition of
the timestamp param for insert is 64 bit int. So my guess is thrift is passing your timestamp
to int() and so you are always sending the same timestamp as the code runs 3 times in the
same second. 

You are seeing the foobar2@ value occur multiple times because when a timestamp collision
occurs the column with the highest byte value wins. 

Use this as your timestamp int(time.time() * 1e6)

Aaron

On 10 Nov, 2010,at 08:59 AM, Rajat Chopra <rchopra@makara.com> wrote:

Requesting the forum’s kind attention to consistency failures that I notice.
 
Cassandra version - 0.6.4
Thrift version – 0.4.0
Driving Language – Python
Machine – 4 core, 8G, Fedora 13, i686
storage_conf.xml - default
 
I took the example from ->
http://wiki.apache.org/cassandra/ThriftExamples#Python
 
And changed it little bit to do multiple inserts with different values on the same column
:
 
  try:
        transport.open()
        #Insert the data into Keyspace 1
        value1 = "foobar1@example.com"
        value2 = "foobar2@example.com"
       
        for x in range(3):
            client.insert(keyspace, key, column_path, value1, time.time(), ConsistencyLevel.ALL)
            result1 = client.get_slice(keyspace, key, column_parent, predicate,
ConsistencyLevel.ALL)
            client.insert(keyspace, key, column_path, value2, time.time(), ConsistencyLevel.ALL)
            result2 = client.get_slice(keyspace, key, column_parent, predicate,
ConsistencyLevel.ALL)
            pp.pprint(result1)
            pp.pprint(result2)
 
 
And the output I see is :
[ ColumnOrSuperColumn(column=Column(timestamp=1289332871, name='email', value='foobar1@example.com'),
super_column=None)]
[ ColumnOrSuperColumn(column=Column(timestamp=1289332871, name='email', value='foobar2@example.com'),
super_column=None)]
[ ColumnOrSuperColumn(column=Column(timestamp=1289332871, name='email', value='foobar2@example.com'),
super_column=None)]
[ ColumnOrSuperColumn(column=Column(timestamp=1289332871, name='email', value='foobar2@example.com'),
super_column=None)]
[ ColumnOrSuperColumn(column=Column(timestamp=1289332871, name='email', value='foobar2@example.com'),
super_column=None)]
[ ColumnOrSuperColumn(column=Column(timestamp=1289332871, name='email', value='foobar2@example.com'),
super_column=None)]
 
 
Should I not get  ‘foobar1, foobar2, foobar1, foobar2, foobar1, foobar2’, especially
when I give ConsistencyLevel.ALL (silly though it’s a single node cluster)?
 
My first guess with what is going wrong is timestamp, which then brings the question of timestamp
resolution. Nevertheless, why do I see foobar1/foobar2 in the first iteration?
Is the issue 876, related to this?  ( https://issues.apache.org/jira/browse/CASSANDRA-876
)
If this is expected behavior, can anyone suggest a workaround for my piece of code that expects
quick updates to the same column?
 
I would appreciate any help/insight into the matter.
Sincere thanks,
Rajat
 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message