cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Hanna (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1050) Too many splits for ColumnFamily with only a few rows
Date Fri, 21 May 2010 17:01:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870052#action_12870052
] 

Jeremy Hanna commented on CASSANDRA-1050:
-----------------------------------------

Johan - I applied this patch on my local trunk and ran the word count on it - I get perfect
results on all but the /tmp/wordcount3 - that gets 1006 instead of 1000.  It looks like it
resolves many of the issues that were happening with CASSANDRA-1042 though.

> Too many splits for ColumnFamily with only a few rows
> -----------------------------------------------------
>
>                 Key: CASSANDRA-1050
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1050
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.6
>            Reporter: Joost Ouwerkerk
>             Fix For: 0.6.2
>
>         Attachments: CASSANDRA-1050.patch
>
>
> ColumnFamilyInputFormat creates splits for the entire Keyspace.  If one ColumnFamily
has 100 Million rows and another has only 100 rows, the number of splits will be the 1,526
(assuming 64k rows per split) for either one, since it is based on the total number of unique
keys across the whole keyspace, and not on the number of rows in the ColumnFamily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message