cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3045) Update ColumnFamilyOutputFormat to use new bulkload API
Date Sun, 23 Oct 2011 15:48:32 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133671#comment-13133671
] 

Brandon Williams commented on CASSANDRA-3045:
---------------------------------------------

This isn't as easy as it seems.  Bulk loading this way requires becoming a fat client.  Since
hadoop is colocated with cassandra, this means we would have to divorce the "ip == node" marriage.
 This means rewriting most of how gossip works, adding the port for the storage proto (and
thus allowing port divergence, an idea we have not been fond of in the past), modifying MessagingService,
Incoming/OutgoingTcpConnection, and probably other classes that are notoriously hairy.

That is a lot of work, very difficult to make backwards-compatible, and we really don't know
what, if any, sort of gains we'll see using this method afterwards.  I'm personally very strongly
-1 on making these changes to gossip since I feel like it is finally fairly stable.

Even in a non-colocated setup, the task jvms would still need to respect RING_DELAY, which
might be enough to erode any gains that this could provide in many scenarios.

One option might be to speak the storage proto directly to the local C* instance, but add
some kind of logic that says 'this is not a node nor a fat client, just accept writes/reads
from it and nothing else.'
                
> Update ColumnFamilyOutputFormat to use new bulkload API
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3045
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3045
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Jonathan Ellis
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 1.1
>
>
> The bulk loading interface added in CASSANDRA-1278 is a great fit for Hadoop jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message