Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of prescod@gmail.com designates
 209.85.210.194 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:reply-to:in-reply-to:references:from:date:message-id
         :subject:to:content-type:content-transfer-encoding;
        b=G+O+EBS3psASCkf4LUohmyfrdcnwC23XL51G4NIQFg6tmdD1tJCxCKIT6vm5Q9aXKi
         fbNKjOdfbwvWyiWxFhZ3vt5ahFC1cVfMCsfddFcuYcmtNqDGf1BiJnjfYdxNtdMFnAPS
         fhfTEa5fFd+yJD6YOQz9WVNu6e6a57Vc9JJ28=
MIME-Version: 1.0
Reply-To: paul@prescod.net
In-Reply-To: <y2i8f9ac4a91004111112s9bf7440am1e5a66faed6b0724@mail.gmail.com>
References: <COL122-W33496ADBFA6EEE8B1631EBE7150@phx.gbl>
 <h2r7ca6fc441004102034k81b6bbddw4d05af62e78045a1@mail.gmail.com>
	<p2ia7fcf8301004110043tb3896d19s1d46344657af49d6@mail.gmail.com>
	<x2r8f9ac4a91004110054s3e6c0b1czc201398036c053c1@mail.gmail.com>
	<q2pa7fcf8301004110131y4af6fbf7p8c64daad25fb9dd8@mail.gmail.com>
	<x2t8f9ac4a91004110233s120c5290i5fb466b985ffa9b@mail.gmail.com>
	<001801cad975$168b0110$43a10330$@com>
 <w2g8f9ac4a91004110612l60c324b3g53fced1a634d4d25@mail.gmail.com>
	<s2sa7fcf8301004111036iae304266q25b441ddb1f5ae51@mail.gmail.com>
	<y2i8f9ac4a91004111112s9bf7440am1e5a66faed6b0724@mail.gmail.com>
From: Paul Prescod <prescod@gmail.com>
Date: Sun, 11 Apr 2010 11:15:26 -0700
Message-ID: <y2z1cb725391004111115o95ee01a4jf9004ecc72afb1f2@mail.gmail.com>
Subject: Re: How to perform queries on Cassandra?
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Benjamin is pointing out that you must be using the word "username" to
mean something different than he is using it.

BY DEFINITION usernames are unique in the most common use of the word.
So what do you really mean if not "username"?

On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vineetdaniel@gmail.com> wr=
ote:
> its not a problem its a scenario, which we need to handle. And all I am
> trying to do is to achieve what is not there with API i.e a workaroud.
>
> On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b@b3k.us> wrote:
>>
>> A system that permits multiple people to have the same username has a
>> serious problem.
>>
>> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vineetdaniel@gmail.com>
>> wrote:
>> > How to handle same usernames. Otherwise seems fine to me.
>> >
>> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <sunht@dopsun.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> As far as I can see it, the Cassandra API currently supports criteria=
s
>> >> on:
>> >>
>> >> Token =E2=80=93 Key =E2=80=93 Super Column Name (if applicable) - Col=
umn Names
>> >>
>> >>
>> >>
>> >> I guess Token is not usually used for the day to day queries, so, Key
>> >> and
>> >> Column Names are normally used for querying. For the user name and
>> >> password
>> >> case, I guess it can be done like this:
>> >>
>> >>
>> >>
>> >> Define a CF as UserAuth with type as Super, and Key is user name, whi=
le
>> >> password can be the SuperKeyName. So, while you receive the user name
>> >> and
>> >> password from the UI (or any other methods), it can be queried via:
>> >> multiget_slice or get_range_slices, if there are anything returned,
>> >> means
>> >> that the user name and password matches.
>> >>
>> >>
>> >>
>> >> If not using the super column name, and put the password as the colum=
n
>> >> name, the column name usually not used for these kind of discretionar=
y
>> >> values (actually, I don=E2=80=99t see any definitive documents on how=
 to use
>> >> the
>> >> column Names and Super Columns, flexibility is the good of Cassandra,
>> >> or is
>> >> it bad if abused? :P)
>> >>
>> >>
>> >>
>> >> Not sure whether this is the best way, but I guess it will work.
>> >>
>> >>
>> >>
>> >> Regards,
>> >>
>> >> Dop
>> >>
>> >>
>> >>
>> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
>> >> Sent: Sunday, April 11, 2010 5:33 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: How to perform queries on Cassandra?
>> >>
>> >>
>> >>
>> >> Hi Benjamin
>> >>
>> >> I'll try to make it more clear to you.
>> >> We have a user table with fields 'id', 'username', and 'password'. No=
w
>> >> if
>> >> use the ideal way to store key/value, like :
>> >> username : vineetdaniel
>> >> timestamp
>> >> password : <password>
>> >> timestamp
>> >>
>> >> second user :
>> >>
>> >> username: <seconduser>
>> >> timestamp
>> >> password:<password>
>> >>
>> >> and so on, here what i assume is that as we cannot make search on
>> >> values
>> >> (as confirmed by guys on cassandra forums) we are not able to perform
>> >> robust
>> >> 'where' queries. Now what i propose is this.
>> >>
>> >> Rather than using a static values for column names use values itself
>> >> and
>> >> unique key as identifier. So, the above example when put in as per me
>> >> would
>> >> be.
>> >>
>> >> vineetdaniel : vineetdaniel
>> >> timestamp
>> >>
>> >> <password>:<password>
>> >> timestamp
>> >>
>> >> second user
>> >> seconduser:seconduser
>> >> timestamp
>> >>
>> >> password:password
>> >> timestamp
>> >>
>> >> By using above methodology we can simply make search on keys itself
>> >> rather
>> >> than going into using different CF's. But to add further, this cannot
>> >> be
>> >> used for every situation. I am still exploring this, and soon will be
>> >> updating the group and my blog with information pertaining to this. A=
s
>> >> cassandra is new, I think every idea or experience should be shared
>> >> with the
>> >> community.
>> >>
>> >> I hope I example is clear this time. Should you have any queries feel
>> >> free
>> >> to revert.
>> >>
>> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b@b3k.us> wrote:
>> >>
>> >> Sorry, I don't understand your example.
>> >>
>> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> >> <vineetdaniel@gmail.com> wrote:
>> >> > Benjamin I quite agree to you, but what in case of duplicate
>> >> > usernames,
>> >> > suppose if I am not using unique names as in email id's . If we hav=
e
>> >> > duplicacy in usernames we cannot use it for key, so what should be
>> >> > the
>> >> > solution. I think keeping incremental numeric id as key and keeping
>> >> > the
>> >> > name
>> >> > and value same in the column family.
>> >> >
>> >> > Example :
>> >> > User1 has password as 123456
>> >> >
>> >> > Cassandra structure :
>> >> >
>> >> > 1 as key
>> >> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 user1 =
- column name
>> >> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 value =
- user1
>> >> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 123456=
 - column name
>> >> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =
value - 123456
>> >> >
>> >> > I m thinking of doing it this way for my applicaton, this way i can
>> >> > run
>> >> > different sorts of queries too. Any feedback on this is welcome.
>> >> >
>> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b@b3k.us> wrote:
>> >> >>
>> >> >> You would have a Column Family, not a column for that; let's call =
it
>> >> >> the Users CF. =C2=A0You'd use username as the row key and have a c=
olumn
>> >> >> called 'password'. =C2=A0For your example query, you'd retrieve ro=
w key
>> >> >> 'usr2', column 'password'. =C2=A0The general pattern is that you c=
reate
>> >> >> CFs
>> >> >> to act as indices for each query you want to perform. =C2=A0There =
is no
>> >> >> equivalent to a relational store to perform arbitrary queries. =C2=
=A0You
>> >> >> must structure things to permit the queries of interest.
>> >> >>
>> >> >>
>> >> >> b
>> >> >>
>> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <sikerasakti@gmail.com>
>> >> >> wrote:
>> >> >> > I have already read the API spesification. Honestly I do not
>> >> >> > understand
>> >> >> > how to use it. Because there are not an examples.
>> >> >> >
>> >> >> > For example I have a column like this:
>> >> >> >
>> >> >> > UserName =C2=A0=C2=A0 Password
>> >> >> > usr1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 abc
>> >> >> > usr2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 xyz
>> >> >> > usr3 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 opm
>> >> >> >
>> >> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >> >
>> >> >> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Select Password From Users Where =
UserName =3D "usr2";
>> >> >> >
>> >> >> > Now I want to get the password using OODBMS DB4o Object Query=C2=
=A0 and
>> >> >> > Java
>> >> >> >
>> >> >> > =C2=A0=C2=A0=C2=A0=C2=A0 ObjectSet QueryResult =3D db.query(new =
Predicate()
>> >> >> > =C2=A0=C2=A0=C2=A0=C2=A0 {
>> >> >> > =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 public boole=
an match(Users Myusers)
>> >> >> > =C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {
>> >> >> > =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 return Myuser.getUserName() =3D=3D "usr2";
>> >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 }
>> >> >> > =C2=A0=C2=A0=C2=A0=C2=A0 });
>> >> >> >
>> >> >> > After we get the Users instance in the QueryResult, hence we can
>> >> >> > get
>> >> >> > the
>> >> >> > usr2's password.
>> >> >> >
>> >> >> > How we perform this query using Cassandra API and Java??
>> >> >> > Would you tell me please??=C2=A0 Thank You.
>> >> >> >
>> >> >> > Dir.
>> >> >> >
>> >> >> >
>> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <paul@prescod.net=
>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> No. Cassandra has an API.
>> >> >> >>
>> >> >> >> http://wiki.apache.org/cassandra/API
>> >> >> >>
>> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <sikerasakti@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > Does Cassandra has a default query language such as SQL in
>> >> >> >> > RDBMS
>> >> >> >> > and Object Query in OODBMS?=C2=A0 Thank you.
>> >> >> >> >
>> >> >> >> > Dir.
>> >> >> >> >
>> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> >> > <malsmith@treehousesystems.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> It's sort of an interesting problem - in RDBMS one relativel=
y
>> >> >> >> >> simple
>> >> >> >> >> approach would be calculate a rectangle that is X km by Y km
>> >> >> >> >> with
>> >> >> >> >> User
>> >> >> >> >> 1's
>> >> >> >> >> location at the center.=C2=A0 So the rectangle is UserX - 10=
KmX ,
>> >> >> >> >> UserY-10KmY to
>> >> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >> >>
>> >> >> >> >> Then you could query the database for all other users where
>> >> >> >> >> that
>> >> >> >> >> each
>> >> >> >> >> user
>> >> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10K=
mX
>> >> >> >> >> and
>> >> >> >> >> curUserY
>> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> >> Kilometers
>> >> >> >> >> to
>> >> >> >> >> degrees of=C2=A0 lat and longitude=C2=A0 (that you can find =
on a google
>> >> >> >> >> search)
>> >> >> >> >>
>> >> >> >> >> With the right indexes this query actually runs pretty well.
>> >> >> >> >>
>> >> >> >> >> Translating that to Cassandra seems a bit complex at first -
>> >> >> >> >> but
>> >> >> >> >> you
>> >> >> >> >> could
>> >> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> >> resolution
>> >> >> >> >> (like a
>> >> >> >> >> square of 5KM per side) and assign every user to a particula=
r
>> >> >> >> >> grid
>> >> >> >> >> ID.
>> >> >> >> >> That
>> >> >> >> >> way you just calculate with grid ID User1 is in then do a
>> >> >> >> >> direct
>> >> >> >> >> key
>> >> >> >> >> lookup
>> >> >> >> >> to get a list of the users in that same grid id.
>> >> >> >> >>
>> >> >> >> >> A second approach would be to have to column families -- one
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> a
>> >> >> >> >> Latitude to a list of users who are at that latitude and a
>> >> >> >> >> second
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> users who are at a particular longitude.=C2=A0 You could do =
the
>> >> >> >> >> same
>> >> >> >> >> rectange
>> >> >> >> >> calculation above then do a get_slice range lookup to get a
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> >> longitudes.
>> >> >> >> >> You would then need to do a in-memory nested loop to find th=
e
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> that are in both lists.=C2=A0 This second approach could cau=
se some
>> >> >> >> >> trouble
>> >> >> >> >> depending on where you search and how many users you really
>> >> >> >> >> have
>> >> >> >> >> --
>> >> >> >> >> some
>> >> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >> >>
>> >> >> >> >> So, it seems some version of a chunking / grid id thing woul=
d
>> >> >> >> >> be
>> >> >> >> >> the
>> >> >> >> >> better approach.=C2=A0=C2=A0 If you let people zoom in or zo=
om out - you
>> >> >> >> >> could
>> >> >> >> >> just
>> >> >> >> >> have different column families for each level of zoom.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >> >>
>> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> >> {
>> >> >> >> >> Decimal f =3D 0.0M;
>> >> >> >> >> lat =3D Math.Abs(lat);
>> >> >> >> >> f =3D 68.99M;
>> >> >> >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (lat >=
=3D 0.0M && lat < 10.0M) { f =3D 68.71M; }
>> >> >> >> >> else if (lat >=3D 10.0M && lat < 20.0M) { f =3D 68.73M; }
>> >> >> >> >> else if (lat >=3D 20.0M && lat < 30.0M) { f =3D 68.79M; }
>> >> >> >> >> else if (lat >=3D 30.0M && lat < 40.0M) { f =3D 68.88M; }
>> >> >> >> >> else if (lat >=3D 40.0M && lat < 50.0M) { f =3D 68.99M; }
>> >> >> >> >> else if (lat >=3D 50.0M && lat < 60.0M) { f =3D 69.12M; }
>> >> >> >> >> else if (lat >=3D 60.0M && lat < 70.0M) { f =3D 69.23M; }
>> >> >> >> >> else if (lat >=3D 70.0M && lat < 80.0M) { f =3D 69.32M; }
>> >> >> >> >> else if (lat >=3D 80.0M) { f =3D 69.38M; }
>> >> >> >> >>
>> >> >> >> >> return f;
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Decimal MilesPerDegreeLatitude =3D
>> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> >> Decimal MilesPerDegreeLongitude =3D ((Decimal)
>> >> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 dRadius =3D 10.0M=C2=A0 // ten miles
>> >> >> >> >> Decimal deltaLat =3D dRadius / MilesPerDegreeLatitude;
>> >> >> >> >> Decimal deltaLong =3D dRadius / MilesPerDegreeLongitude;
>> >> >> >> >>
>> >> >> >> >> ps.TopLatitude =3D zList[0].Latitude - deltaLat;
>> >> >> >> >> ps.TopLongitude =3D zList[0].Longitude - deltaLong;
>> >> >> >> >> ps.BottomLatitude =3D zList[0].Latitude + deltaLat;
>> >> >> >> >> ps.BottomLongitude =3D zList[0].Longitude + deltaLong;
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >> >>
>> >> >> >> >> 2010/4/9 Onur AKTAS <onur.aktas@live.com>:
>> >> >> >> >> > ...
>> >> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> >> > calculations
>> >> >> >> >> > on
>> >> >> >> >> > the
>> >> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> >> > beginning.
>> >> >> >> >> > Lets say we have latitude and longitude coordinates of all
>> >> >> >> >> > users
>> >> >> >> >> > and
>> >> >> >> >> > we
>> >> >> >> >> > have
>> >> >> >> >> > =C2=A0Distance(from_lat, from_long, to_lat, to_long) funct=
ion
>> >> >> >> >> > which
>> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >> >>
>> >> >> >> >> I'm not an expert, but I think that it boils down to
>> >> >> >> >> "MapReduce"
>> >> >> >> >> and
>> >> >> >> >> "Hadoop".
>> >> >> >> >>
>> >> >> >> >> I don't think that there's any top-down tutorial on those tw=
o
>> >> >> >> >> words,
>> >> >> >> >> you'll have to research yourself starting here:
>> >> >> >> >>
>> >> >> >> >> =C2=A0* http://en.wikipedia.org/wiki/MapReduce
>> >> >> >> >>
>> >> >> >> >> =C2=A0* http://hadoop.apache.org/
>> >> >> >> >>
>> >> >> >> >> =C2=A0* http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >> >>
>> >> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >> >>
>> >> >> >> >> =C2=A0Paul Prescod
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >
>
>