cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-3909) Pig should handle wide rows
Date Mon, 16 Apr 2012 14:18:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254712#comment-13254712
] 

Brandon Williams edited comment on CASSANDRA-3909 at 4/16/12 2:17 PM:
----------------------------------------------------------------------

CASSANDRA-3264 (and subsequently CASSANDRA-3883) added wide row support to hadoop, by returning
one column in the row in every call.  Pig, however, is fancy enough that it could handle a
wide row in a bag, since bags spill to disk; it just needs the pagination for transport since
thrift doesn't stream.  Also, if we returned what CFIF gave us, a user wanting to work within
the row would need another costly M/R job to join the row back to its original state, so we
essentially need to 'undo' the pagination and rebuild the row as a bag.   This patch does
that, with the caveat that you cannot access any indexes (and frankly if you have indexes
on a wide row you're probably doing something wrong) since it's impossible for us to order
the indexes correctly ahead of time in a wide row.
                
      was (Author: brandon.williams):
    CASSANDRA-3264 (and subsequently CASSANDRA-3883) added wide row support to hadoop, by
returning one column in the row in every call.  Pig, however, is fancy enough that it could
handle a wide row in a bag, since bags spill to disk; it just needs the pagination to for
transport since thrift doesn't stream.  Also, if we returned what CFIF gave us, a user wanting
to work within the row would need another costly M/R job to join the row back to its original
state, so we essentially need to 'undo' the pagination and rebuild the row as a bag.   This
patch does that, with the caveat that you cannot access any indexes (and frankly if you have
indexes on a wide row you're probably doing something wrong) since it's impossible for us
to order the indexes correctly ahead of time in a wide row.
                  
> Pig should handle wide rows
> ---------------------------
>
>                 Key: CASSANDRA-3909
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.1.1
>
>         Attachments: 3909.txt
>
>
> Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message