cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Gardner <dave.gard...@imagini.net>
Subject Cassandra / Hadoop
Date Wed, 16 Jun 2010 15:55:45 GMT
Hi all,

Is it possible to use the Cassandra ColumnFamilyInputFormat in combination
with the Hadoop "streaming" job?  Within the Hadoop docs it says that you
can specify other plugins, eg:

-inputformat JavaClassName

http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs

However it then says:

"The class you supply for the input format should return key/value pairs of
Text class."

Whereas the Cassandra Wiki says:

"Cassandra rows or row fragments (that is, pairs of key + SortedMap of
columns) are input to Map tasks for processing by your job"
http://wiki.apache.org/cassandra/HadoopSupport

So I'm wondering if this would work or if it's just never going to happen. I
guess the alternative is to write a Hadoop Java class for the job, but this
is what I'm trying to avoid.

Has anyone got any examples of getting M/R working with Cassandra as input
source?

Thanks

Dave

Mime
View raw message