cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Hanna (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-3628) Make Pig/CassandraStorage delete functionality disabled by default and configurable
Date Thu, 15 Dec 2011 18:08:30 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeremy Hanna updated CASSANDRA-3628:
------------------------------------

    Attachment: 3628.txt

Split out the conditions so it can do a noop for null values.  Not 100% certain that's the
desired behavior - do we want to do that or do we want to just write an empty value.  However,
if we want to write an empty value, we have to modify the null to an empty value because of
the NPEs that happen if we don't change it.

For our purposes, we want to skip them if the values are null.  In our code we also log the
column family name and the column name, but that might be up to the user who wants to do that
- adds a lot of logging.  Maybe people want that though.
                
> Make Pig/CassandraStorage delete functionality disabled by default and configurable
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3628
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3628
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>              Labels: pig
>             Fix For: 1.0.7, 1.1
>
>         Attachments: 3628.txt
>
>
> Right now, there is a way to delete column with the CassandraStorage loadstorefunc. 
In practice it is a bad idea to have that enabled by default.  A scenario: do an outer join
and you don't have a value for something and then you write out to cassandra all of the attributes
of that relation.  You've just inadvertently deleted a column for all the rows that didn't
have that value as a result of the outer join.  It can be argued that you want to be careful
with how you project after the join.  However, I would think disabling by default and having
a configurable property to enable it for the instances when you explicitly want to use it
is the right plan.
> Fwiw, we had a bug in one of our scripts that did exactly as described above.  It's good
to fix the bug.  It's bad to implicitly delete data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message