cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-2436) Secondary Index Updates Invalidate Data Set
Date Fri, 08 Apr 2011 01:57:05 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-2436.
---------------------------------------

    Resolution: Not A Problem

default_validation_class means "all data that isn't explicitly in column_metadata conforms
to this data type."  So you've violated that.  You have two options:

- set d_v_c to BytesType (the default)
- leave the column definition alone, but only drop the index part (maybe this is what you
were trying to do, but you changed from "colour" to "color")

More generally, note that best practice is to only use d_v_c in CFs with dynamic column names.
 I.e., if you know what the columns are going to be in the CF ahead of time as you do here,
you shouldn't use d_v_c.

> Secondary Index Updates Invalidate Data Set
> -------------------------------------------
>
>                 Key: CASSANDRA-2436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2436
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>         Environment: RedHat Linux 5.5 - OS is not important here.
>            Reporter: Dexter Fryar
>            Priority: Blocker
>              Labels: index, indexed, indexing, read, write
>
> Creating an index, validator, and default validator then renaming/dropping the index
later results in read errors and an invalid unreadable data set.
> Updating the CF with the old index will not resolve the problem. You can insert/write
all you want, but reads will fail if you come across a row that included one of these cases.
The only workaround that I've been able to use is to know exactly what the columns/changes
were prior to the CF change and iterate through all the rows inserting the same column name
will a NULL value. One problem here is that you __must__ absolutely know what the row keys
are called because you can't do a read to get them.
> 1) create a secondary index on a column with a validator and a default validator
> 2) insert a row
> 3) read and verify the row
> 4) update the CF/index/name/validator
> 5) read the CF and get an error (CLI or Pycassa)
> CLI Commands to create the row and CF/Index
> create column family cf_testing with comparator=UTF8Type and default_validation_class=UTF8Type
and column_metadata=[{column_name: colour, validation_class: LongType, index_type: KEYS}];
> set cf_testing['key']['colour']='1234';
> list cf_testing;
> update column family cf_testing with comparator=UTF8Type and default_validation_class=UTF8Type
and column_metadata=[{column_name: color, validation_class: LongType, index_type: KEYS}];
> ERROR from the CLI:
> list cf_testing;
> Using default limit of 100
> -------------------
> RowKey: key
> invalid UTF8 bytes 00000000000004d2
> Here is the Pycassa client code that shows this error too.
> badindex.py
> #!/usr/local/bin/python2.7
> import pycassa
> import uuid
> import sys
> def main():
>   try:
>     keyspace="badindex"
>     serverPoolList = ['localhost:9160']
>     pool = pycassa.connect(keyspace, serverPoolList)
>   except:
>     print "couldn't get a connection"
>     sys.exit()
>   cfname="cf_testing"
>   cf = pycassa.ColumnFamily(pool, cfname)
>   results = cf.get_range(start='key', finish='key', row_count=1)
>   for key, columns in results:
>     print key, '=>', columns
> if __name__ == "__main__":
>   main()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message