cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Assigned: (CASSANDRA-342) hadoop integration
Date Mon, 01 Feb 2010 23:46:18 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis reassigned CASSANDRA-342:
----------------------------------------

    Assignee: Jonathan Ellis  (was: Jeff Hodges)

Here's my first stab at hadoop support.  I took Jeff's patches as a starting point, but the
many chnages we've made to Cassandra's internals since then mean the results are pretty different.
 - BootUp is no longer required; instead we use the Fat Client api
 - Switched to ColumnFamily as the unit for InputFormat, rather than KeySpace
 - Use get_range_slice instead of get_key_range
 - Use Tokens instead of Strings for range splitting
 - Add build.xml and bin/ scripts for WordCount demo

The combination of all this means we get RandomPartitioner support for free.  We also get
InputSplit location information for free.

My patch 01 and 02 correspond to Jeff's 02 and 03 (no changes to Cassandra internals have
been required so far).  Then my 03 is just more changes to the WordCount example (I should
probably squash that...)

Still todo: breaking a node's range into multiple InputSplits (this will require minor changes
to Cassandra)

Also: as I have said before, I don't really know Hadoop, so quite possibly I did something
stupid here.  (For instance, Jeff's InputFormat used Writeable subclasses for both key and
value; mine uses just String and ColumnFamily since that is more natural, and the IF contract
does not require Writeable-ness.  Is this Bad Hadoop Form?)

> hadoop integration
> ------------------
>
>                 Key: CASSANDRA-342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-342
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch, 0001-v4-add-basic-hadoop-support-one-split-per-node.txt,
0002-v3-CASSANDRA-342.-Working-hadoop-support.patch, 0002-v4-add-wordcount-hadoop-example.txt,
0003-v3-CASSANDRA-342.-Adding-the-WordCount-example.patch, 0003-v4-add-WordCountSetup-multiple-tests.txt
>
>
> Some discussion on -dev: http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3Cf5f3a6290907240123y22f065edp1649f7c5c1add491@mail.gmail.com%3E

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message