Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 35707 invoked from network); 10 Apr 2010 03:01:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Apr 2010 03:01:19 -0000 Received: (qmail 64957 invoked by uid 500); 10 Apr 2010 03:01:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 64911 invoked by uid 500); 10 Apr 2010 03:01:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 64903 invoked by uid 99); 10 Apr 2010 03:01:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Apr 2010 03:01:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sikerasakti@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pw0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Apr 2010 03:01:11 +0000 Received: by pwj2 with SMTP id 2so3425235pwj.31 for ; Fri, 09 Apr 2010 20:00:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=rHp/p8Ab3P6ALTYEQ3P0uM8aQ3hHzX/fzehT04EqF34=; b=mdBli4q6EvEBDq1X2UNb+TrbES7tTVZis+IEv7EktmD4aJPlhVInF8lMyjdeahR2+h heVEmT6OzF0NP6/bItj5zHEdizgEGICNeotHJLPNTu7+A9qO5YOuzQa6hRURn2rXSVVf SaiI4JGugM2oVKs9sMA2zIc0QqtQ6Gi8Tnsm0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=cBK5adga2xXP43FTJVCx0cQsEXaiIRrn6RdC4uvJxd4Ov5emGBc3t8iBU6tGJq+6Np QjLgpnKEztb38lJNC8/paOxIvm5xuKGm/c61tnPmcQXulubOXPII+ZBgobpnG42+6+P7 Cb6AKtfbJ7/ZDfdnel4sHNx4w1YP/uCoWkLUM= MIME-Version: 1.0 Received: by 10.140.136.8 with HTTP; Fri, 9 Apr 2010 20:00:49 -0700 (PDT) In-Reply-To: <1270857660.3807.23.camel@malsmith-laptop> References: <1270857660.3807.23.camel@malsmith-laptop> Date: Sat, 10 Apr 2010 10:00:49 +0700 Received: by 10.141.22.20 with SMTP id z20mr1339383rvi.182.1270868449782; Fri, 09 Apr 2010 20:00:49 -0700 (PDT) Message-ID: Subject: Re: How to perform queries on Cassandra? From: dir dir To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd1eb0a816d9c0483d91efc X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd1eb0a816d9c0483d91efc Content-Type: text/plain; charset=ISO-8859-1 Does Cassandra has a default query language such as SQL in RDBMS and Object Query in OODBMS? Thank you. Dir. On Sat, Apr 10, 2010 at 7:01 AM, malsmith wrote: > > > It's sort of an interesting problem - in RDBMS one relatively simple > approach would be calculate a rectangle that is X km by Y km with User 1's > location at the center. So the rectangle is UserX - 10KmX , UserY-10KmY to > UserX+10KmX , UserY+10KmY > > Then you could query the database for all other users where that each user > considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and curUserY > > UserY-10KmY and curUserY < UserY+10KmY > * Not the 10KmX and 10KmY are really a translation from Kilometers to > degrees of lat and longitude (that you can find on a google search) > > With the right indexes this query actually runs pretty well. > > Translating that to Cassandra seems a bit complex at first - but you could > try something like pre-calculating a grid with the right resolution (like a > square of 5KM per side) and assign every user to a particular grid ID. That > way you just calculate with grid ID User1 is in then do a direct key lookup > to get a list of the users in that same grid id. > > A second approach would be to have to column families -- one that maps a > Latitude to a list of users who are at that latitude and a second that maps > users who are at a particular longitude. You could do the same rectange > calculation above then do a get_slice range lookup to get a list of users > from range of latitude and a second list from the range of longitudes. > You would then need to do a in-memory nested loop to find the list of users > that are in both lists. This second approach could cause some trouble > depending on where you search and how many users you really have -- some > latitudes and longitudes have many many people in them > > So, it seems some version of a chunking / grid id thing would be the better > approach. If you let people zoom in or zoom out - you could just have > different column families for each level of zoom. > > > I'm stuck on a stopped train so -- here is even more code: > > static Decimal GetLatitudeMiles(Decimal lat) > { > Decimal f = 0.0M; > lat = Math.Abs(lat); > f = 68.99M; > if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; } > else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; } > else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; } > else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; } > else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; } > else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; } > else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; } > else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; } > else if (lat >= 80.0M) { f = 69.38M; } > > return f; > } > > > Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude); > Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double) > zList[0].Latitude))) * 24900.0M / 360.0M; > dRadius = 10.0M // ten miles > Decimal deltaLat = dRadius / MilesPerDegreeLatitude; > Decimal deltaLong = dRadius / MilesPerDegreeLongitude; > > ps.TopLatitude = zList[0].Latitude - deltaLat; > ps.TopLongitude = zList[0].Longitude - deltaLong; > ps.BottomLatitude = zList[0].Latitude + deltaLat; > ps.BottomLongitude = zList[0].Longitude + deltaLong; > > > > > On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote: > > 2010/4/9 Onur AKTAS : > > ... > > I'm trying to find out how do you perform queries with calculations on the > > fly without inserting the data as calculated from the beginning. > > Lets say we have latitude and longitude coordinates of all users and we have > > Distance(from_lat, from_long, to_lat, to_long) function which > > gives distance between lat/longs pairs in kilometers. > > I'm not an expert, but I think that it boils down to "MapReduce" and "Hadoop". > > I don't think that there's any top-down tutorial on those two words, > you'll have to research yourself starting here: > > * http://en.wikipedia.org/wiki/MapReduce > > * http://hadoop.apache.org/ > > * http://wiki.apache.org/cassandra/HadoopSupport > > I don't think it is all documented in any one place yet... > > Paul Prescod > > > --000e0cd1eb0a816d9c0483d91efc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Does Cassandra has a default query language such as SQL in RDBMS
and Ob= ject Query in OODBMS?=A0 Thank you.

Dir.

On Sat, Apr 10, 2010 at 7:01 AM, malsmith <malsmith@treehousesystems.com<= /a>> wrote:
=20 =20


It's sort of an interesting problem - in RDBMS one relatively simple ap= proach would be calculate a rectangle that is X km by Y km with User 1'= s location at the center.=A0 So the rectangle is UserX - 10KmX , UserY-10Km= Y to UserX+10KmX , UserY+10KmY

Then you could query the database for all other users where that each user = considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and cu= rUserY > UserY-10KmY and curUserY < UserY+10KmY=A0
* Not the 10KmX and 10KmY are really a translation from Kilometers to degre= es of=A0 lat and longitude=A0 (that you can find on a google search)

With the right indexes this query actually runs pretty well.=A0=A0

Translating that to Cassandra seems a bit complex at first - but you could = try something like pre-calculating a grid with the right resolution (like a= square of 5KM per side) and assign every user to a particular grid ID.=A0 = That way you just calculate with grid ID User1 is in then do a direct key l= ookup to get a list of the users in that same grid id.

A second approach would be to have to column families -- one that maps a La= titude to a list of users who are at that latitude and a second that maps u= sers who are at a particular longitude.=A0 You could do the same rectange c= alculation above then do a get_slice range lookup to get a list of users fr= om range of latitude and a second list from the range of longitudes.=A0=A0= =A0 You would then need to do a in-memory nested loop to find the list of u= sers that are in both lists.=A0 This second approach could cause some troub= le depending on where you search and how many users you really have -- some= latitudes and longitudes have many many people in them

So, it seems some version of a chunking / grid id thing would be the better= approach.=A0=A0 If you let people zoom in or zoom out - you could just hav= e different column families for each level of zoom.


I'm stuck on a stopped train so -- here is even more code:

static Decimal GetLatitudeMiles(Decimal lat)
{
Decimal f =3D 0.0M;
lat =3D Math.Abs(lat);
f =3D 68.99M;
=A0=A0=A0=A0=A0=A0=A0=A0 if (lat >=3D 0.0M && lat < 10.0M) { = f =3D 68.71M; }
else if (lat >=3D 10.0M && lat < 20.0M) { f =3D 68.73M; }
else if (lat >=3D 20.0M && lat < 30.0M) { f =3D 68.79M; }
else if (lat >=3D 30.0M && lat < 40.0M) { f =3D 68.88M; }
else if (lat >=3D 40.0M && lat < 50.0M) { f =3D 68.99M; }
else if (lat >=3D 50.0M && lat < 60.0M) { f =3D 69.12M; }
else if (lat >=3D 60.0M && lat < 70.0M) { f =3D 69.23M; }
else if (lat >=3D 70.0M && lat < 80.0M) { f =3D 69.32M; }
else if (lat >=3D 80.0M) { f =3D 69.38M; }

return f;
}


Decimal MilesPerDegreeLatitude =3D GetLatitudeMiles(zList[0].Latitude);
Decimal MilesPerDegreeLongitude =3D ((Decimal) Math.Abs(Math.Cos((Double) z= List[0].Latitude))) * 24900.0M / 360.0M;
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 dRadi= us =3D 10.0M=A0 // ten miles
Decimal deltaLat =3D dRadius / MilesPerDegreeLatitude;
Decimal deltaLong =3D dRadius / MilesPerDegreeLongitude;

ps.TopLatitude =3D zList[0].Latitude - deltaLat;
ps.TopLongitude =3D zList[0].Longitude - deltaLong;
ps.BottomLatitude =3D zList[0].Latitude + deltaLat;
ps.BottomLongitude =3D zList[0].Longitude + deltaLong;




On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:=20
2010/4/9 Onur AKTAS <onur.aktas@live.com>:
> ...
> I'm trying to find out how do you perform queries with calculation=
s on the
> fly without inserting the data as calculated from the beginning.
> Lets say we have latitude and longitude coordinates of all users and w=
e have
> =A0Distance(from_lat, from_long, to_lat, to_long) function which
> gives distance between lat/longs pairs in kilometers.

I'm not an expert, but I think that it boils down to "MapReduce&qu=
ot; and "Hadoop".

I don't think that there's any top-down tutorial on those two words=
,
you'll have to research yourself starting here:

 * htt=
p://en.wikipedia.org/wiki/MapReduce

 * http://hadoop.ap=
ache.org/

 * http://wiki.apache.org/cassandra/HadoopSupport

I don't think it is all documented in any one place yet...

 Paul Prescod


--000e0cd1eb0a816d9c0483d91efc--