cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Hanna (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row
Date Fri, 11 Nov 2011 23:24:51 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148832#comment-13148832
] 

Jeremy Hanna commented on CASSANDRA-2855:
-----------------------------------------

fwiw - saw an interesting analogous ticket for hbase storage - https://issues.apache.org/jira/browse/PIG-2114
it talks about omitNulls and how it's used on the load and on the store side.
                
> Skip rows with empty columns when slicing entire row
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2855
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>            Reporter: Jeremy Hanna
>            Assignee: T Jake Luciani
>            Priority: Minor
>              Labels: hadoop
>             Fix For: 0.8.8
>
>         Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt, 2855-v5.txt, v1-0001-CASSANDRA-2855-ignore-ghosts-when-no-predicate-specifi.txt
>
>
> We have been finding that range ghosts appear in results from Hadoop via Pig.  This could
also happen if rows don't have data for the slice predicate that is given.  This leads to
having to do a painful amount of defensive checking on the Pig side, especially in the case
of range ghosts.
> We would like to add an option to skip rows that have no column values in it.  That functionality
existed before in core Cassandra but was removed because of the performance penalty of that
checking.  However with Hadoop support in the RecordReader, that is batch oriented anyway,
so individual row reading performance isn't as much of an issue.  Also we would make it an
optional config parameter for each job anyway, so people wouldn't have to incur that penalty
if they are confident that there won't be those empty rows or they don't care.
> It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message