Hi all,
Is it possible to use the Cassandra ColumnFamilyInputFormat in combination
with the Hadoop "streaming" job? Within the Hadoop docs it says that you
can specify other plugins, eg:
-inputformat JavaClassName
http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs
However it then says:
"The class you supply for the input format should return key/value pairs of
Text class."
Whereas the Cassandra Wiki says:
"Cassandra rows or row fragments (that is, pairs of key + SortedMap of
columns) are input to Map tasks for processing by your job"
http://wiki.apache.org/cassandra/HadoopSupport
So I'm wondering if this would work or if it's just never going to happen. I
guess the alternative is to write a Hadoop Java class for the job, but this
is what I'm trying to avoid.
Has anyone got any examples of getting M/R working with Cassandra as input
source?
Thanks
Dave
|