cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <>
Subject Re: Cassandra w/ Hadoop
Date Fri, 20 Aug 2010 01:58:43 GMT
  On 8/19/10 11:14 AM, Mark wrote:
>  On 8/19/10 10:23 AM, Jeremy Hanna wrote:
>> I would check out for 
>> more info.  I'll try to explain a bit more here, but I don't think 
>> there's a tutorial out there yet.
>> For input:
>> - configure your main class where you're starting the mapreduce job 
>> the way the word_count is configured (with either storage-conf or in 
>> your code via the ConfigHelper).  It will complain specifically about 
>> stuff you hadn't configured - esp. important is your cassandra server 
>> and port.
>> - the inputs to your mapper are going to be what's coming from 
>> cassandra - so your key with a map of row values
>> - you need to set your column name in your overridden setup method in 
>> your mapper
>> - for the reducer, nothing really changes from a normal map/reduce, 
>> unless you want to output to cassandra
>> - generally cassandra just provides an inputformat and split classes 
>> to read from cassandra - you can find the guts in the 
>> org.apache.cassandra.hadoop package
>> For output:
>> - in your reducer, you could just write to cassandra directly via 
>> thrift.  there is a built-in outputformat coming in 0.7 but it still 
>> might change before 0.7 final - that will queue up changes so it will 
>> write large blocks all at once.
>> On Aug 19, 2010, at 12:07 PM, Mark wrote:
>>> Are there any examples/tutorials on the web for reading/writing from 
>>> Cassandra into/from Hadoop?
>>> I found the example in contrib/word_count but I really can't make 
>>> sense of it... a tutorial/explanation would help.
> Thanks!
How does batching across all rows work? Does it just take an arbitrary 
start w/ a limit of x and then use the last key from that result as the 
next start? Does this work with RandomPartitioner?

View raw message