incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@tis.bz.it>
Subject OPP and controlling partitioning
Date Mon, 15 Nov 2010 14:53:22 GMT
Hello list,

I'm in the process of writing an application which uses cassandra as a
"storage" backend. The application is a graph database and it's supposed
to be a baseline application for further development in the field.

The idea is to implement a property graph: a multigraph (multiple edges
connecting two vertices are possible) with properties in the form of
name/value for edges and vertices. The idea is to traverse the graph
with queries like "give me all the women that are liked by men i know",
something like:
Vertex[name=claudio]=>outgoingEdge[type=knows]=>Vertex[gender=male]=>outgoingEdge[type=likes]=>Vertex[gender=female].
This is basically a step by step expansion/filtering based on properties.

In my architecture my application-logic node is coupled with the
cassandra node storing its data. I'd like to have some kind of "atomic
set" of data that is "granted" to be stored on the same cassandra node
(in my case the vertex, its adj list, its properties, its edges and
their properties), so that i can issue the required filtering and
expansion to a particular node which will issue the logic behind it (and
i can route such request with the same logic cassandra routes its
requests).
This is in an effort to (a) minimize network i/o (i'd be able to send
the query token to the application node which would issue a local get to
its local cassandra) and (b) distribute computation (i'd be able to
distribute filtering between all the nodes storing for example the
node's neighborhood). This is still not optimal, but it would be a good
start.

For this reason i thought about a datamodel that has composite keys:

vertexid and edgeid are uuids while propertyname is a string.

CF vertices {

    vertexid_propertyname {
       
        propertyvalue: null
    }
}


CF edges {
   
    vertexid_[in|out]_propertyname_edgeid {
 
        propertyvalue: othervertexid
    }
}

With this datamodel i could easily and efficiently issue slices and
ranges to cassandra with the equality predicates on properties i need.
What i need now is to partition my data on the prefix "vertexid_". Such
a datamodel does have a concept of "ascending ordering", so i thought
about OPP, but to my understanding OPP does not grant that all the data
starting with the same prefix will end up in the same cassandra node,
but only some of it. My set of data about a vertex could still be split
between two cassandra nodes in case the token ends up being a key in the
middle of the set, right?

What i require exactly is:

(1) to have all the rows belonging to the same vertexid (which is a
uuid) on the same cassandra node. Can i achieve this?
(2) given this partitioning, know the IP of the cassandra node storing
that vertex data, from outside of cassandra. This is the logic cassandra
uses to route requests for keys and i have to access it from outside.

Can anybody comment about these?


Thanks


Claudio


Unit Research & Development - Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative
Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order
to fulfil contractual and fiscal obligations and also to send you information regarding our
services and events. Your personal data are processed with and without electronic means and
by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard
to confidentiality, personal identity and the right to personal data protection. At any time
and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the
processing of your personal data for the purpose of sending advertising materials and also
to exercise the right to access personal data and other rights referred to in Section 7 of
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n.
19, Bolzano. You can find the complete information on the web site www.tis.bz.it.



Mime
View raw message