cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-789) Add configurable range sizes, paging to hadoop range queries
Date Wed, 17 Mar 2010 13:30:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846410#action_12846410
] 

Jonathan Ellis commented on CASSANDRA-789:
------------------------------------------

+1

> Add configurable range sizes, paging to hadoop range queries
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-789
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: CASSANDRA-789.patch, CASSANDRA-789.patch
>
>
> For very large (billions) numbers of keys, the current hardcoded 4096 keys per InputSplit
could cause the split generator to OOM, since all splits are held in memory at once.  So we
want to make 2 changes:
>  1) make the number of keys configurable*
>  2) make record reader page instead of assuming it can read all rows into memory at once
> Note: going back to specifying number of splits instead of number of keys is bad for
two reasons.  First, it does not work with the standard hadoop mapred.min.split.size configuration
option.  Second, it means we have no way of measuring progress in the record reader, since
we have no idea how many keys are in the split.  If we specify number of keys, then even if
we page we know (to within a small margin of error) how many keys to expect, even if we page.
> See CASSANDRA-775, CASSANDRA-342 for background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message