cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Cassandra w/ Hadoop
Date Thu, 19 Aug 2010 18:14:28 GMT
  On 8/19/10 10:34 AM, Christian Decker wrote:
> If, like me, you prefer to write your jobs on the fly try taking a 
> look at Pig. Cassandra provides a loadfunc under contrib/pig/ in the 
> source package which allows you to load data directly from Cassandra.
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>
>
> On Thu, Aug 19, 2010 at 7:23 PM, Jeremy Hanna 
> <jeremy.hanna1234@gmail.com <mailto:jeremy.hanna1234@gmail.com>> wrote:
>
>     I would check out http://wiki.apache.org/cassandra/HadoopSupport
>     for more info.  I'll try to explain a bit more here, but I don't
>     think there's a tutorial out there yet.
>
>     For input:
>     - configure your main class where you're starting the mapreduce
>     job the way the word_count is configured (with either storage-conf
>     or in your code via the ConfigHelper).  It will complain
>     specifically about stuff you hadn't configured - esp. important is
>     your cassandra server and port.
>     - the inputs to your mapper are going to be what's coming from
>     cassandra - so your key with a map of row values
>     - you need to set your column name in your overridden setup method
>     in your mapper
>     - for the reducer, nothing really changes from a normal
>     map/reduce, unless you want to output to cassandra
>     - generally cassandra just provides an inputformat and split
>     classes to read from cassandra - you can find the guts in the
>     org.apache.cassandra.hadoop package
>
>     For output:
>     - in your reducer, you could just write to cassandra directly via
>     thrift.  there is a built-in outputformat coming in 0.7 but it
>     still might change before 0.7 final - that will queue up changes
>     so it will write large blocks all at once.
>
>
>     On Aug 19, 2010, at 12:07 PM, Mark wrote:
>
>     > Are there any examples/tutorials on the web for reading/writing
>     from Cassandra into/from Hadoop?
>     >
>     > I found the example in contrib/word_count but I really can't
>     make sense of it... a tutorial/explanation would help.
>
>
That's definitely an option and I'll probably lean towards that in the 
near future. I am just trying to get a complete understanding of the 
whole infrastructure before working with higher level features.

Also same problem exists... I need a nice tutorial :)

Mime
View raw message