On Mon, Mar 1, 2010 at 3:58 PM, Ian Dees wrote:
Hi List,

I was attempting to play around with= Cassandra last month using the example of the OpenStreetMap database as a = target to frame my learning. Now that I have a little extra time to start t= his endeavor again, I'm wondering if you could help me understand how t= he OpenStreetMap data model could fit into the Cassandra model.

You've chosen something that is= fairly complicated for playing around. :)

A quick overview of the data model I have in min= d:
1. There is a "node" which has:
=A0=A0 =A0= - Location (lat/lon)
=A0=A0 =A0 - Numeric id=A0
=A0=A0 =A0 - Tags (list of key/va= lue pairs)=A0
=A0
2. A "way" has:
=A0=A0 =A0 - An ordered list of &q= uot;node"
=A0=A0 =A0 - Numeric id
=A0=A0 =A0 - Tag= s (list of key/value pairs)=A0
=A0
There is one step beyond this, but I'm wondering if you could help me f= it this simple first step into Cassandra.
There is almost always more than one way to model an applicatio= n. =A0How you need to query your data is usually the most important factor = when modelling.=A0

My= queries would be something like the following:
1. What are the nodes in a given bounding box, what are the ways attac= hed to those nodes?

This is the greatest constraint. =A0You need to query in two dimensions, s= o to simplify this I might suggest storing the node coordinates in a Z-orde= r curve: http://en.wikipedia.org/wiki/Z-order_(curve) =A0This will redu= ce the dimensionality so that you can more easily range scan or slice witho= ut having to do multiple queries and then perform intersection in the clien= t. =A0There is a research paper on this technique in range-capable DHTs her= e:=A0http://www.geo.unizh.ch/~rsp/gir06/papers/individual/soro.pdf

Another approach that was suggested to me on irc might = be to partition areas into fixed-size 'chunks', identified by the u= pper-left corner. =A0These would become the row keys, with the columns bein= g the node keys. =A0Since the chunks all have the same height and width, it= 's relatively straightforward to convert a bounding box to the short li= st of chunks that you need to query, but you'll have to do some filteri= ng client-side to meet the exact bounds.
=A0
Once you have the nodes, finding their ways in another c= olumn family where the rows are nodes and their ways are columns should be = relatively easy.

2. What are the tags and nodes for a way with a Numeric id = of x?

Use column families where the row k= eys are the numeric IDs, and the columns are the tags or nodes.
<= br>
3. What are the nodes that have a tag key of "foo"? How abou= t nodes that have "foo" =3D "bar"?

Use another column family where the row keys are t= he tags and the columns are the node ids. =A0For the 'foo=3Dbar' si= tutation, make that the tag name.=A0

-Brandon
--00163630f40f2f5bd60480d698fe--