Return-Path: Delivered-To: apmail-incubator-cassandra-commits-archive@minotaur.apache.org Received: (qmail 66754 invoked from network); 1 Feb 2010 23:46:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Feb 2010 23:46:42 -0000 Received: (qmail 81408 invoked by uid 500); 1 Feb 2010 23:46:42 -0000 Delivered-To: apmail-incubator-cassandra-commits-archive@incubator.apache.org Received: (qmail 81395 invoked by uid 500); 1 Feb 2010 23:46:42 -0000 Mailing-List: contact cassandra-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-commits@incubator.apache.org Received: (qmail 81385 invoked by uid 99); 1 Feb 2010 23:46:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 23:46:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 23:46:40 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DBD6D234C1EE for ; Mon, 1 Feb 2010 15:46:18 -0800 (PST) Message-ID: <1657632350.11171265067978898.JavaMail.jira@brutus.apache.org> Date: Mon, 1 Feb 2010 23:46:18 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: cassandra-commits@incubator.apache.org Subject: [jira] Assigned: (CASSANDRA-342) hadoop integration In-Reply-To: <253729204.1249425734789.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-342: ---------------------------------------- Assignee: Jonathan Ellis (was: Jeff Hodges) Here's my first stab at hadoop support. I took Jeff's patches as a starting point, but the many chnages we've made to Cassandra's internals since then mean the results are pretty different. - BootUp is no longer required; instead we use the Fat Client api - Switched to ColumnFamily as the unit for InputFormat, rather than KeySpace - Use get_range_slice instead of get_key_range - Use Tokens instead of Strings for range splitting - Add build.xml and bin/ scripts for WordCount demo The combination of all this means we get RandomPartitioner support for free. We also get InputSplit location information for free. My patch 01 and 02 correspond to Jeff's 02 and 03 (no changes to Cassandra internals have been required so far). Then my 03 is just more changes to the WordCount example (I should probably squash that...) Still todo: breaking a node's range into multiple InputSplits (this will require minor changes to Cassandra) Also: as I have said before, I don't really know Hadoop, so quite possibly I did something stupid here. (For instance, Jeff's InputFormat used Writeable subclasses for both key and value; mine uses just String and ColumnFamily since that is more natural, and the IF contract does not require Writeable-ness. Is this Bad Hadoop Form?) > hadoop integration > ------------------ > > Key: CASSANDRA-342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-342 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Attachments: 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch, 0001-v4-add-basic-hadoop-support-one-split-per-node.txt, 0002-v3-CASSANDRA-342.-Working-hadoop-support.patch, 0002-v4-add-wordcount-hadoop-example.txt, 0003-v3-CASSANDRA-342.-Adding-the-WordCount-example.patch, 0003-v4-add-WordCountSetup-multiple-tests.txt > > > Some discussion on -dev: http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3Cf5f3a6290907240123y22f065edp1649f7c5c1add491@mail.gmail.com%3E -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.