cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5540) Concurrent secondary index updates remove rows from the index
Date Mon, 06 May 2013 20:02:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650050#comment-13650050
] 

Jonathan Ellis commented on CASSANDRA-5540:
-------------------------------------------

Hmm.  It looks like this can happen when multiple inserts happen at the same timestamp, since
we delete the existing entry with its own timestamp.  But if the replacement has the same
timestamp, then the tombstone wins the tie.

Any clever ideas to fix this [~beobal]?
                
> Concurrent secondary index updates remove rows from the index
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-5540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5540
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.4
>            Reporter: Alexei Bakanov
>
> Existing rows disappear from secondary index when doing simultaneous updates of a row
with the same secondary index value.
> Here is a little pycassa script that reproduces a bug. The script inserts 4 rows with
same secondary index value, reads those rows back and check that there are 4 of them.
> Please run two instances of the script simultaneously in two separate terminals in order
to simulate concurrent updates:
> {code}
> -----scrpit.py START-----
> import pycassa
> from pycassa.index import *
> pool = pycassa.ConnectionPool('ks123')
> cf = pycassa.ColumnFamily(pool, 'cf1')
> while True:
>     for rowKey in xrange(4):
>         cf.insert(str(rowKey), {'indexedColumn': 'indexedValue'})
>     index_expression = create_index_expression('indexedColumn', 'indexedValue')
>     index_clause = create_index_clause([index_expression])
>     rows = cf.get_indexed_slices(index_clause)
>     length = len(list(rows))
>     if length == 4:
>         pass
>     else:
>         print 'found just %d rows out of 4' % length
> pool.dispose()
> ---script.py FINISH---
> ---schema cli start---
> create keyspace ks123
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {datacenter1 : 1}
>   and durable_writes = true;
> use ks123;
> create column family cf1
>   with column_type = 'Standard'
>   and comparator = 'AsciiType'
>   and default_validation_class = 'AsciiType'
>   and key_validation_class = 'AsciiType'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and populate_io_cache_on_flush = false
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and column_metadata = [
>     {column_name : 'indexedColumn',
>     validation_class : AsciiType,
>     index_name : 'INDEX1',
>     index_type : 0}]
>   and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ---schema cli finish---
> {code}
> Test cluster created with 'ccm create --cassandra-version 1.2.4 --nodes 1 --start testUpdate'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message