Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 65796 invoked from network); 25 May 2010 12:33:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 May 2010 12:33:47 -0000 Received: (qmail 67499 invoked by uid 500); 25 May 2010 12:33:47 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 67471 invoked by uid 500); 25 May 2010 12:33:46 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 67462 invoked by uid 99); 25 May 2010 12:33:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 May 2010 12:33:45 +0000 X-ASF-Spam-Status: No, hits=-1461.8 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 May 2010 12:33:45 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4PCXOTT022649 for ; Tue, 25 May 2010 12:33:25 GMT Message-ID: <21124730.32291274790804974.JavaMail.jira@thor> Date: Tue, 25 May 2010 08:33:24 -0400 (EDT) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Commented: (CASSANDRA-1050) Too many splits for ColumnFamily with only a few rows In-Reply-To: <12620774.45721273001517097.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871113#action_12871113 ] Jonathan Ellis commented on CASSANDRA-1050: ------------------------------------------- btw, if you build the thrift code with "ant gen-thrift-java" it will re-run rat for you to avoid blowing away the apache license headers in the generated code > Too many splits for ColumnFamily with only a few rows > ----------------------------------------------------- > > Key: CASSANDRA-1050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1050 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 0.6 > Reporter: Joost Ouwerkerk > Assignee: Johan Oskarsson > Fix For: 0.7 > > Attachments: CASSANDRA-0.6-1050.patch, CASSANDRA-1050.patch > > > ColumnFamilyInputFormat creates splits for the entire Keyspace. If one ColumnFamily has 100 Million rows and another has only 100 rows, the number of splits will be the 1,526 (assuming 64k rows per split) for either one, since it is based on the total number of unique keys across the whole keyspace, and not on the number of rows in the ColumnFamily. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.