cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "HadoopSupport" by jeremyhanna
Date Wed, 16 Jun 2010 23:06:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "HadoopSupport" page has been changed by jeremyhanna.
http://wiki.apache.org/cassandra/HadoopSupport?action=diff&rev1=11&rev2=12

--------------------------------------------------

  == Overview ==
- Cassandra version 0.6 and later enable certain Hadoop functionality against Cassandra's
data store.  Specifically, support has been added for !MapReduce and Pig.
+ Cassandra version 0.6 and later enable certain Hadoop functionality against Cassandra's
data store.  Specifically, support has been added for [[http://hadoop.apache.org/mapreduce/|MapReduce]]
and [[http://hadoop.apache.org/pig/|Pig]].
  
  == MapReduce ==
- While writing output to Cassandra has always been possible by implementing certain interfaces
from the Hadoop library, version 0.6 of Cassandra added support for retrieving data from Cassandra.
 Cassandra 0.6 adds implementations of !InputSplit, !InputFormat, and !RecordReader so that
Hadoop !MapReduce jobs can retrieve data from Cassandra.  For an example of how this works,
see the contrib/word_count example in 0.6 or later.  Cassandra rows or row  fragments (that
is, pairs of key + `SortedMap`  of columns) are input to Map tasks for  processing by your
job, as specified by a `SlicePredicate`  that describes which columns to fetch from each row.
+ While writing output to Cassandra has always been possible by implementing certain interfaces
from the Hadoop library, version 0.6 of Cassandra added support for retrieving data from Cassandra.
 Cassandra 0.6 adds implementations of [[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/InputSplit.html|InputSplit]],
[[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/InputFormat.html|InputFormat]],
and [[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/RecordReader.html|RecordReader]]
so that Hadoop [[http://hadoop.apache.org/mapreduce/|MapReduce]] jobs can retrieve data from
Cassandra.  For an example of how this works, see the contrib/word_count example in 0.6 or
later.  Cassandra rows or row  fragments (that is, pairs of key + `SortedMap`  of columns)
are input to Map tasks for  processing by your job, as specified by a `SlicePredicate`  that
describes which columns to fetch from each row.
  
  Here's how this looks in the word_count example, which selects just one  configurable columnName
from each row:
  

Mime
View raw message