incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Decker <>
Subject Re: Cassandra w/ Hadoop
Date Thu, 19 Aug 2010 17:34:35 GMT
If, like me, you prefer to write your jobs on the fly try taking a look at
Pig. Cassandra provides a loadfunc under contrib/pig/ in the source package
which allows you to load data directly from Cassandra.
Christian Decker
Software Architect

On Thu, Aug 19, 2010 at 7:23 PM, Jeremy Hanna <>wrote:

> I would check out for more
> info.  I'll try to explain a bit more here, but I don't think there's a
> tutorial out there yet.
> For input:
> - configure your main class where you're starting the mapreduce job the way
> the word_count is configured (with either storage-conf or in your code via
> the ConfigHelper).  It will complain specifically about stuff you hadn't
> configured - esp. important is your cassandra server and port.
> - the inputs to your mapper are going to be what's coming from cassandra -
> so your key with a map of row values
> - you need to set your column name in your overridden setup method in your
> mapper
> - for the reducer, nothing really changes from a normal map/reduce, unless
> you want to output to cassandra
> - generally cassandra just provides an inputformat and split classes to
> read from cassandra - you can find the guts in the
> org.apache.cassandra.hadoop package
> For output:
> - in your reducer, you could just write to cassandra directly via thrift.
>  there is a built-in outputformat coming in 0.7 but it still might change
> before 0.7 final - that will queue up changes so it will write large blocks
> all at once.
> On Aug 19, 2010, at 12:07 PM, Mark wrote:
> > Are there any examples/tutorials on the web for reading/writing from
> Cassandra into/from Hadoop?
> >
> > I found the example in contrib/word_count but I really can't make sense
> of it... a tutorial/explanation would help.

View raw message