incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Cassandra w/ Hadoop
Date Thu, 19 Aug 2010 18:14:57 GMT
  On 8/19/10 10:23 AM, Jeremy Hanna wrote:
> I would check out http://wiki.apache.org/cassandra/HadoopSupport for more info.  I'll
try to explain a bit more here, but I don't think there's a tutorial out there yet.
>
> For input:
> - configure your main class where you're starting the mapreduce job the way the word_count
is configured (with either storage-conf or in your code via the ConfigHelper).  It will complain
specifically about stuff you hadn't configured - esp. important is your cassandra server and
port.
> - the inputs to your mapper are going to be what's coming from cassandra - so your key
with a map of row values
> - you need to set your column name in your overridden setup method in your mapper
> - for the reducer, nothing really changes from a normal map/reduce, unless you want to
output to cassandra
> - generally cassandra just provides an inputformat and split classes to read from cassandra
- you can find the guts in the org.apache.cassandra.hadoop package
>
> For output:
> - in your reducer, you could just write to cassandra directly via thrift.  there is a
built-in outputformat coming in 0.7 but it still might change before 0.7 final - that will
queue up changes so it will write large blocks all at once.
>
>
> On Aug 19, 2010, at 12:07 PM, Mark wrote:
>
>> Are there any examples/tutorials on the web for reading/writing from Cassandra into/from
Hadoop?
>>
>> I found the example in contrib/word_count but I really can't make sense of it...
a tutorial/explanation would help.
Thanks!

Mime
View raw message