# incubator-cassandra-user mailing list archives

##### Site index · List index
Message view
Top
From Malcolm Smith <malsm...@treehousesystems.com>
Subject Re: How to perform queries on Cassandra?
Date Sat, 10 Apr 2010 01:11:52 GMT
```Mike are you stuck on a train too? :-)

On Apr 9, 2010, at 8:51 PM, Mike Gallamore <mike.e.gallamore@googlemail.com
> wrote:

> I apologize in advance if this goes into esoteric algorithms a bit
> too much but I think this will get to an interesting idea to solve
> your problem. My background is physics particularly computer
> simulations of complex systems. Anyways in cosmology an interesting
> algorithm is called an n-body tree code (its been around for at
> least 20 years so a lot is available online about it). Since every
> object with mass (well in general relativity actually anything with
> energy but I digress) interacts with every other object with mass,
> you end up with the "n-body" problem. The number of interactions in
> a system goes as n(n-1) ~= n^2 where n is the number of elements.
> This lead to a nightmare to do simulations of large systems, say two
> galaxies colliding. 1 billion X 1 billion minus one is huge and
> effectively incalculable since you would have to calculate this each
> time you wanted to increment the simulation a tiny bit ahead in
> time. How do you get a reasonable approximation to the solution? The
> answer or at least one of them is n-body "tree codes".
>
> You take advantage of the fact that the the force that one star
> feels from another falls off as 1/r^2 and importantly two stars far
> away from the first star but close together relatively have roughly
> the same magnitude and direction of the "r" vector. So you can
> simply clump them together, ie sum there masses, and the force is GM1
> (M"sum")/r^2. To do this efficiently numerically you break down the
> system using binary search trees. Thinking in 2D just to keep it
> simple, you divide the space into top left, top right, bottom left
> bottom right as a first approximation. Then continually do that
> until you end up with each element in its own box. When you figure
> out the forces you are going to apply to the system you just take
> the distance to the middle of the box that contains the ones you are
> going to consider together (the closer to the star in question the
> smaller the boxes need to be because the direction of r changes
> quicker the closer the boxes are to the star, but farther away you
> can use larger and larger boxes (which would contain a 2D tree like
> structure descending to the point where each of the stars contained
> are trapped in there own little box), sum the number of stars in the
> box and presto.
>
> How would this help you? Well if you encoded the "box hierachy", say
> 1 for top left, 2 for top right, 3 for bottom left, 4 for bottom
> right, then you could specify the box that someone is in based on a
> string like "14234". To find the set of stars/points/whatever that
> are at least x away you just would have to do a range search for all
> the points with their location "string" larger than or equal to the
> location sting corresponding to the closest corner of the biggest
> box such that its corner is at least "x" units away. Quite good as a
> first approximation and the search algorithm should run as O(nlog
> (n)) which is a logirithmic decrease in computation time. Ie the 1
> billion times 1 billion -1 problem becomes 1 billion times ~9, much
> much nicer. Really difficult thing to explain without looking over a
> diagram in person I admit but hopefully it makes sense if you look
> up the algorithm online.
>
>
> On 04/09/2010 05:01 PM, malsmith wrote:
>>
>>
>>
>> It's sort of an interesting problem - in RDBMS one relatively
>> simple approach would be calculate a rectangle that is X km by Y km
>> with User 1's location at the center.  So the rectangle is UserX -
>> 10KmX , UserY-10KmY to UserX+10KmX , UserY+10KmY
>>
>> Then you could query the database for all other users where that
>> each user considered is curUserX > UserX-10Km and curUserX < UserX
>> +10KmX and curUserY > UserY-10KmY and curUserY < UserY+10KmY
>> * Not the 10KmX and 10KmY are really a translation from Kilometers
>> to degrees of  lat and longitude  (that you can find on a google
>> search)
>>
>> With the right indexes this query actually runs pretty well.
>>
>> Translating that to Cassandra seems a bit complex at first - but
>> you could try something like pre-calculating a grid with the right
>> resolution (like a square of 5KM per side) and assign every user to
>> a particular grid ID.  That way you just calculate with grid ID
>> User1 is in then do a direct key lookup to get a list of the users
>> in that same grid id.
>>
>> A second approach would be to have to column families -- one that
>> maps a Latitude to a list of users who are at that latitude and a
>> second that maps users who are at a particular longitude.  You
>> could do the same rectange calculation above then do a get_slice
>> range lookup to get a list of users from range of latitude and a
>> second list from the range of longitudes.    You would then need to
>> do a in-memory nested loop to find the list of users that are in
>> both lists.  This second approach could cause some trouble
>> depending on where you search and how many users you really have --
>> some latitudes and longitudes have many many people in them
>>
>> So, it seems some version of a chunking / grid id thing would be
>> the better approach.   If you let people zoom in or zoom out - you
>> could just have different column families for each level of zoom.
>>
>>
>> I'm stuck on a stopped train so -- here is even more code:
>>
>> static Decimal GetLatitudeMiles(Decimal lat)
>> {
>> Decimal f = 0.0M;
>> lat = Math.Abs(lat);
>> f = 68.99M;
>>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> else if (lat >= 80.0M) { f = 69.38M; }
>>
>> return f;
>> }
>>
>>
>> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
>> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos
>> ((Double) zList[0].Latitude))) * 24900.0M / 360.0M;
>>                         dRadius = 10.0M  // ten miles
>> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>>
>> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>>
>>
>>
>> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>>>
>>> 2010/4/9 Onur AKTAS <onur.aktas@live.com>:
>>> > ...
>>> > I'm trying to find out how do you perform queries with
>>> calculations on the
>>> > fly without inserting the data as calculated from the beginning.
>>> > Lets say we have latitude and longitude coordinates of all users
>>> and we have
>>> >  Distance(from_lat, from_long, to_lat, to_long) function which
>>> > gives distance between lat/longs pairs in kilometers.
>>>
>>> I'm not an expert, but I think that it boils down to "MapReduce"
>>> and "Hadoop".
>>>
>>> I don't think that there's any top-down tutorial on those two words,
>>> you'll have to research yourself starting here:
>>>
>>>  * http://en.wikipedia.org/wiki/MapReduce
>>>
>>>  * http://hadoop.apache.org/
>>>
>>>  * http://wiki.apache.org/cassandra/HadoopSupport
>>>
>>> I don't think it is all documented in any one place yet...
>>>
>>>  Paul Prescod
>>>
>>
>

```
Mime
View raw message