Return-Path: Delivered-To: apmail-incubator-cassandra-dev-archive@minotaur.apache.org Received: (qmail 1076 invoked from network); 1 Sep 2009 19:49:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Sep 2009 19:49:03 -0000 Received: (qmail 86891 invoked by uid 500); 1 Sep 2009 19:49:02 -0000 Delivered-To: apmail-incubator-cassandra-dev-archive@incubator.apache.org Received: (qmail 86758 invoked by uid 500); 1 Sep 2009 19:49:02 -0000 Mailing-List: contact cassandra-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-dev@incubator.apache.org Received: (qmail 86736 invoked by uid 99); 1 Sep 2009 19:49:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 19:49:02 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [72.14.220.154] (HELO fg-out-1718.google.com) (72.14.220.154) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 19:48:51 +0000 Received: by fg-out-1718.google.com with SMTP id 22so847908fge.0 for ; Tue, 01 Sep 2009 12:48:29 -0700 (PDT) Received: by 10.86.159.37 with SMTP id h37mr1750975fge.79.1251834509497; Tue, 01 Sep 2009 12:48:29 -0700 (PDT) Received: from ?192.168.1.102? (93-96-139-213.zone4.bethere.co.uk [93.96.139.213]) by mx.google.com with ESMTPS id l19sm905261fgb.18.2009.09.01.12.48.27 (version=SSLv3 cipher=RC4-MD5); Tue, 01 Sep 2009 12:48:28 -0700 (PDT) Message-ID: <4A9D7A79.8060904@oskarsson.nu> Date: Tue, 01 Sep 2009 20:48:09 +0100 From: Johan Oskarsson User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: cassandra-user@incubator.apache.org CC: cassandra-dev@incubator.apache.org Subject: Re: Cassandra + Hadoop + BMT References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I have slapped together a basic Hadoop 0.18 CassandraOutputFormat based on the code Chris put up. Usage: conf.setOutputKeyClass(RowColumn.class); conf.setOutputValueClass(BytesWritable.class); conf.setOutputFormat(CassandraOutputFormat.class); conf.set(CassandraOutputFormat.CONF_COLUMN_FAMILY_NAME, "columnfamilyname"); conf.set(CassandraOutputFormat.CONF_KEYSPACE, "keyspacename"); DistributedCache.addCacheFile(new URI("uri_to_storage-conf.xml"), conf); + your job specific settings. Then after the job run this method: CassandraOutputFormat.forceFlush Source code here: http://github.com/johanoskarsson/cassandraoutputformat/tree/master Big thanks to Chris for figuring out the mystery that is BinaryMemtable /Johan Chris Goffinet wrote: > Hi Guys > > This is long overdue but I have posted a very rough rough example (with > Digg stuff removed) for getting BMT working with Cassandra. Patches are > coming next up for the JIRA tickets. I'll try to get a more generic > map/reduce job finished by end of the week that integrates Hive output. > > http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master > > -Chris