Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 86678 invoked from network); 8 Apr 2011 01:57:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2011 01:57:46 -0000 Received: (qmail 56566 invoked by uid 500); 8 Apr 2011 01:57:46 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 56532 invoked by uid 500); 8 Apr 2011 01:57:46 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 56524 invoked by uid 99); 8 Apr 2011 01:57:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 01:57:46 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 01:57:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D2E9397740 for ; Fri, 8 Apr 2011 01:57:05 +0000 (UTC) Date: Fri, 8 Apr 2011 01:57:05 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: <163556708.42726.1302227825860.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2052312776.41857.1302202205748.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Resolved] (CASSANDRA-2436) Secondary Index Updates Invalidate Data Set MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2436. --------------------------------------- Resolution: Not A Problem default_validation_class means "all data that isn't explicitly in column_metadata conforms to this data type." So you've violated that. You have two options: - set d_v_c to BytesType (the default) - leave the column definition alone, but only drop the index part (maybe this is what you were trying to do, but you changed from "colour" to "color") More generally, note that best practice is to only use d_v_c in CFs with dynamic column names. I.e., if you know what the columns are going to be in the CF ahead of time as you do here, you shouldn't use d_v_c. > Secondary Index Updates Invalidate Data Set > ------------------------------------------- > > Key: CASSANDRA-2436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2436 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.7.4 > Environment: RedHat Linux 5.5 - OS is not important here. > Reporter: Dexter Fryar > Priority: Blocker > Labels: index, indexed, indexing, read, write > > Creating an index, validator, and default validator then renaming/dropping the index later results in read errors and an invalid unreadable data set. > Updating the CF with the old index will not resolve the problem. You can insert/write all you want, but reads will fail if you come across a row that included one of these cases. The only workaround that I've been able to use is to know exactly what the columns/changes were prior to the CF change and iterate through all the rows inserting the same column name will a NULL value. One problem here is that you __must__ absolutely know what the row keys are called because you can't do a read to get them. > 1) create a secondary index on a column with a validator and a default validator > 2) insert a row > 3) read and verify the row > 4) update the CF/index/name/validator > 5) read the CF and get an error (CLI or Pycassa) > CLI Commands to create the row and CF/Index > create column family cf_testing with comparator=UTF8Type and default_validation_class=UTF8Type and column_metadata=[{column_name: colour, validation_class: LongType, index_type: KEYS}]; > set cf_testing['key']['colour']='1234'; > list cf_testing; > update column family cf_testing with comparator=UTF8Type and default_validation_class=UTF8Type and column_metadata=[{column_name: color, validation_class: LongType, index_type: KEYS}]; > ERROR from the CLI: > list cf_testing; > Using default limit of 100 > ------------------- > RowKey: key > invalid UTF8 bytes 00000000000004d2 > Here is the Pycassa client code that shows this error too. > badindex.py > #!/usr/local/bin/python2.7 > import pycassa > import uuid > import sys > def main(): > try: > keyspace="badindex" > serverPoolList = ['localhost:9160'] > pool = pycassa.connect(keyspace, serverPoolList) > except: > print "couldn't get a connection" > sys.exit() > cfname="cf_testing" > cf = pycassa.ColumnFamily(pool, cfname) > results = cf.get_range(start='key', finish='key', row_count=1) > for key, columns in results: > print key, '=>', columns > if __name__ == "__main__": > main() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira