Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 41343 invoked from network); 5 May 2010 14:24:07 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 14:24:07 -0000 Received: (qmail 13470 invoked by uid 500); 5 May 2010 14:24:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13386 invoked by uid 500); 5 May 2010 14:24:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13378 invoked by uid 99); 5 May 2010 14:24:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 14:24:06 +0000 X-ASF-Spam-Status: No, hits=-0.2 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 14:24:01 +0000 Received: by wwi18 with SMTP id 18so2023195wwi.31 for ; Wed, 05 May 2010 07:23:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=P5rLLLKLFePRvaTd+1h9n9UQhfIM+On7h8q4j9YyrGE=; b=ti5A6f/+BiqU/kRoalWDX0g0WdUQ1WCQz9JqmRwbw+5Gma6K8RwSEx7Vq8QWC+4Kgh WhOpqs8QFw3Es9XjZVKT/RwkbBFWl0pyp0noCV07NulmPyKblul3P40NuowJU9JmqWot r+K03fT//gXcgJNvTDTdp4XYYnWzSP8g3sKgI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=jgd/Thm+BVdOGLIqEpCTVbUDyVTRS/59G70an+5IuG6P90l2ZbD8y7s3uU9Cze8yc/ X5UxZajvGABHjI78c+L3lh17Z7X560uBftn6jVOyvAzurYnX6hjwdiL7nzVNPJz0575t 7/DaA5g8Td1yYLJe0hv0SzxT7MHf8quNe48NM= Received: by 10.216.186.138 with SMTP id w10mr7234639wem.206.1273069420416; Wed, 05 May 2010 07:23:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.22.10 with HTTP; Wed, 5 May 2010 07:23:20 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Wed, 5 May 2010 09:23:20 -0500 Message-ID: Subject: Re: Updating (as opposed to just setting) Cassandra data via Hadoop To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm a little confused. CombineFileInputFormat is designed to combine multiple small input splits into one larger one. It's not for merging data (that needs to be part of the reduce phase). Maybe I'm misunderstanding what you're saying. On Tue, May 4, 2010 at 10:53 PM, Mark Schnitzius wrote: > I have a situation where I need to accumulate values in Cassandra on an > ongoing basis. =A0Atomic increments are still in the works apparently > (see=A0https://issues.apache.org/jira/browse/CASSANDRA-721) so for the ti= me > being I'll be using Hadoop, and attempting to feed in both the existing > values and the new values to a M/R process where they can be combined > together and written back out to Cassandra. > The approach I'm taking is to use Hadoop's CombineFileInputFormat to blen= d > the existing data (using Cassandra's ColumnFamilyInputFormat) with the ne= wly > incoming data (using something like Hadoop's SequenceFileInputFormat). > I was just wondering, has anyone here tried this, and were there issues? > =A0I'm worried because the=A0CombineFileInputFormat has restrictions arou= nd > splits being from different pools so I don't know how this will play out > with data from both Cassandra and HDFS. =A0The other option, I suppose, i= s to > use a separate M/R process to replicate the data onto HDFS first, but I'd > rather avoid the extra step and duplication of storage. > Also, if you've tackled a similar situation in the past using a different > approach, I'd be keen to hear about it... > > Thanks > Mark --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com