Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 494327D19 for ; Wed, 3 Aug 2011 18:41:52 +0000 (UTC) Received: (qmail 50758 invoked by uid 500); 3 Aug 2011 18:41:52 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 50667 invoked by uid 500); 3 Aug 2011 18:41:51 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 50658 invoked by uid 99); 3 Aug 2011 18:41:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2011 18:41:51 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2011 18:41:48 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id F3217A6FD3 for ; Wed, 3 Aug 2011 18:41:26 +0000 (UTC) Date: Wed, 3 Aug 2011 18:41:26 +0000 (UTC) From: "Brandon Williams (JIRA)" To: commits@cassandra.apache.org Message-ID: <1008950117.5403.1312396886992.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1900116949.2078.1309794861900.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078912#comment-13078912 ] Brandon Williams commented on CASSANDRA-2855: --------------------------------------------- skip.empty.results should probably be 'skip.empty.rows' or 'skip.tombstones' and there needs to be a check on the predicate to see if it covers the entire row, and if so suppress the tombstone, but if not return the empty slice. > Skip rows with empty columns when slicing entire row > ---------------------------------------------------- > > Key: CASSANDRA-2855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2855 > Project: Cassandra > Issue Type: Improvement > Components: API > Reporter: Jeremy Hanna > Assignee: Jeremy Hanna > Priority: Minor > Labels: hadoop > Fix For: 0.8.4 > > Attachments: 2855-v2.txt, 2855-v3.txt > > > We have been finding that range ghosts appear in results from Hadoop via Pig. This could also happen if rows don't have data for the slice predicate that is given. This leads to having to do a painful amount of defensive checking on the Pig side, especially in the case of range ghosts. > We would like to add an option to skip rows that have no column values in it. That functionality existed before in core Cassandra but was removed because of the performance penalty of that checking. However with Hadoop support in the RecordReader, that is batch oriented anyway, so individual row reading performance isn't as much of an issue. Also we would make it an optional config parameter for each job anyway, so people wouldn't have to incur that penalty if they are confident that there won't be those empty rows or they don't care. > It could be parameter cassandra.skip.empty.rows and be true/false. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira