incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Mahout/Cassandra integration
Date Thu, 04 Nov 2010 19:53:08 GMT
For people interested in using Cassandra with Mahout, there are a few possible integration
points that could be fleshed out.  I was talking with Grant Ingersoll about this at apachecon
and thought I would send out a note about it.  The motivation could be enhancing Cassandra's
analytics capabilities with using Mahout with data stored in Cassandra.

drivers - in the bin directory there is a script that loads drivers.  Those drivers are used
to input to the algorithms from sequence files through the hdfs inputformat by default.  It
could possibly use Cassandra's inputformat or have a pluggable option.  I'm not sure where
the output comes into play, but I would think that it would likewise just be able to use the
outputformat.

datamodel - https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html

DataStore - https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/bayes/interfaces/Datastore.html
Currently there is an HBase and an in memory data store, but that would be a relatively simple
integration point.

Other integration points in the future might be using Flume for output and could also go through
flume to Cassandra through the Cassandra sink that Tyler Hobbs did - https://github.com/thobbs/flume-cassandra-plugin

Anyway, just wanted to relay that info.
Mime
View raw message