spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Bako <jb...@gracenote.com>
Subject Spark GraphX + TitanDB + Cassandra?
Date Tue, 26 Jan 2016 20:19:22 GMT
I’ve found some references online to various implementations (such as Dendrite) leveraging
HDFS via TitanDB + HBase for graph processing.  GraphLab also uses HDFS/Hadoop.  I am wondering
if (and how) one might use TitanDB + Cassandra as the data source for Spark GraphX?  The Gremlin
language seems more targeted towards basic traversals rather than analytics, and I’m unsure
the performance of attempting to use Gremlin to load sub-graphs up into GraphX for analysis.
 For example, if I have a large property graph and wish to run algorithms to find similar
sub-graphs within, would TitanDB/Gremlin even be a consideration?  The underlying data model
that Titan uses in Cassandra does not seem accessible for direct querying via CQL/Thrift.

Any guidance around this nebulous subject is much appreciated!

Joe Bako
Software Architect
Gracenote, Inc.
Mobile: 925.818.2230
http://www.gracenote.com/

[cid:24DDC72C-B607-4624-9CB7-8DB5E866F2BF]

Mime
View raw message