cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dop Sun" <su...@dopsun.com>
Subject RE: How to perform queries on Cassandra?
Date Sun, 11 Apr 2010 13:34:26 GMT
While we are talking same user names, it’s the application design to help figuring out the
difference on other attributes (actually, it’s not Cassandra related, it’s application/
domain issue):

 

 Let’s say there are two scenarios (let me know if there are more):

1.       The identity behind the user name actually are same. In this case, only one is valid.
this happens when one employee re-joins the company

2.       The identity behind the same user name actually are different, in this case, both
are valid. this happens when two clients to open the account, but they happen to have the
same name

 

For the scenario 1, it can be resolved like this:

CF1-UserNameState: Key = user name, Super Column Name: create time (time UUID)

CF2-UserAuth: key=user name, SuperColumn: password

While you are creating the second entry for the same user name (in CF1, with new timeUUID,
the password in CF2 will be overwritten, and means the previously one has been disabled).
Then it’s fine.

 

Well, for scenario 2, it is a little bit complex, since it depends on the nature of your application.
Let’s take an example of the Bank Account Management system: two clients have same name
(both are dop? :P):

Then, it’s the bank’s responsibility to have a way to make me and another Dop different
while me and the other Dop using the same user name Dop to log into the system. In practice,
this is resolved by giving me and others a token generator (like what HSBC did). While logging
into the system, my token generated and his token generated are different. 

 

Again, there are two column families required:

CF1-User: Key= User Name, Super Column Name: unique id derived from token generated from my
generator, Column Name: User UUID: (uuid)

CF2-UserAuth: Key=user uuid, superColumnName: password

 

Then, the application derives the unique id from token inputs, and plus the user name, can
get the user real internal uuid based on the CF-1. Then, with CF-2 and password, it can authenticate
the user uniquely.

 

Cheers~~~

Dop

 

From: vineet daniel [mailto:vineetdaniel@gmail.com] 
Sent: Sunday, April 11, 2010 9:12 PM
To: user@cassandra.apache.org
Subject: Re: How to perform queries on Cassandra?

 

How to handle same usernames. Otherwise seems fine to me.

On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <sunht@dopsun.com> wrote:

Hi,

 

As far as I can see it, the Cassandra API currently supports criterias on:

Token – Key – Super Column Name (if applicable) - Column Names

 

I guess Token is not usually used for the day to day queries, so, Key and Column Names are
normally used for querying. For the user name and password case, I guess it can be done like
this:

 

Define a CF as UserAuth with type as Super, and Key is user name, while password can be the
SuperKeyName. So, while you receive the user name and password from the UI (or any other methods),
it can be queried via: multiget_slice or get_range_slices, if there are anything returned,
means that the user name and password matches. 

 

If not using the super column name, and put the password as the column name, the column name
usually not used for these kind of discretionary values (actually, I don’t see any definitive
documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra,
or is it bad if abused? :P) 

 

Not sure whether this is the best way, but I guess it will work.

 

Regards,

Dop

 

From: Lucifer Dignified [mailto:vineetdaniel@gmail.com] 
Sent: Sunday, April 11, 2010 5:33 PM
To: user@cassandra.apache.org
Subject: Re: How to perform queries on Cassandra?

 

Hi Benjamin

I'll try to make it more clear to you. 
We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way
to store key/value, like :
username : vineetdaniel
timestamp
password : <password>
timestamp

second user :

username: <seconduser>
timestamp
password:<password>

and so on, here what i assume is that as we cannot make search on values (as confirmed by
guys on cassandra forums) we are not able to perform robust 'where' queries. Now what i propose
is this. 

Rather than using a static values for column names use values itself and unique key as identifier.
So, the above example when put in as per me would be.

vineetdaniel : vineetdaniel
timestamp

<password>:<password>
timestamp

second user
seconduser:seconduser
timestamp

password:password
timestamp

By using above methodology we can simply make search on keys itself rather than going into
using different CF's. But to add further, this cannot be used for every situation. I am still
exploring this, and soon will be updating the group and my blog with information pertaining
to this. As cassandra is new, I think every idea or experience should be shared with the community.

I hope I example is clear this time. Should you have any queries feel free to revert.

On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b@b3k.us> wrote:

Sorry, I don't understand your example.


On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
<vineetdaniel@gmail.com> wrote:
> Benjamin I quite agree to you, but what in case of duplicate usernames,
> suppose if I am not using unique names as in email id's . If we have
> duplicacy in usernames we cannot use it for key, so what should be the
> solution. I think keeping incremental numeric id as key and keeping the name
> and value same in the column family.
>
> Example :
> User1 has password as 123456
>
> Cassandra structure :
>
> 1 as key
>            user1 - column name
>            value - user1
>            123456 - column name
>             value - 123456
>
> I m thinking of doing it this way for my applicaton, this way i can run
> different sorts of queries too. Any feedback on this is welcome.
>
> On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b@b3k.us> wrote:
>>
>> You would have a Column Family, not a column for that; let's call it
>> the Users CF.  You'd use username as the row key and have a column
>> called 'password'.  For your example query, you'd retrieve row key
>> 'usr2', column 'password'.  The general pattern is that you create CFs
>> to act as indices for each query you want to perform.  There is no
>> equivalent to a relational store to perform arbitrary queries.  You
>> must structure things to permit the queries of interest.
>>
>>
>> b
>>
>> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <sikerasakti@gmail.com> wrote:
>> > I have already read the API spesification. Honestly I do not understand
>> > how to use it. Because there are not an examples.
>> >
>> > For example I have a column like this:
>> >
>> > UserName    Password
>> > usr1                abc
>> > usr2                xyz
>> > usr3                opm
>> >
>> > suppose I want query the user's password using SQL in RDBMS
>> >
>> >       Select Password From Users Where UserName = "usr2";
>> >
>> > Now I want to get the password using OODBMS DB4o Object Query  and Java
>> >
>> >      ObjectSet QueryResult = db.query(new Predicate()
>> >      {
>> >             public boolean match(Users Myusers)
>> >             {
>> >                  return Myuser.getUserName() == "usr2";
>> >             }
>> >      });
>> >
>> > After we get the Users instance in the QueryResult, hence we can get the
>> > usr2's password.
>> >
>> > How we perform this query using Cassandra API and Java??
>> > Would you tell me please??  Thank You.
>> >
>> > Dir.
>> >
>> >
>> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <paul@prescod.net> wrote:
>> >>
>> >> No. Cassandra has an API.
>> >>
>> >> http://wiki.apache.org/cassandra/API
>> >>
>> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <sikerasakti@gmail.com> wrote:
>> >> > Does Cassandra has a default query language such as SQL in RDBMS
>> >> > and Object Query in OODBMS?  Thank you.
>> >> >
>> >> > Dir.
>> >> >
>> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> > <malsmith@treehousesystems.com>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> It's sort of an interesting problem - in RDBMS one relatively simple
>> >> >> approach would be calculate a rectangle that is X km by Y km with
>> >> >> User
>> >> >> 1's
>> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> UserY-10KmY to
>> >> >> UserX+10KmX , UserY+10KmY
>> >> >>
>> >> >> Then you could query the database for all other users where that
>> >> >> each
>> >> >> user
>> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX
and
>> >> >> curUserY
>> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> * Not the 10KmX and 10KmY are really a translation from Kilometers
>> >> >> to
>> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> search)
>> >> >>
>> >> >> With the right indexes this query actually runs pretty well.
>> >> >>
>> >> >> Translating that to Cassandra seems a bit complex at first - but
you
>> >> >> could
>> >> >> try something like pre-calculating a grid with the right resolution
>> >> >> (like a
>> >> >> square of 5KM per side) and assign every user to a particular grid
>> >> >> ID.
>> >> >> That
>> >> >> way you just calculate with grid ID User1 is in then do a direct
key
>> >> >> lookup
>> >> >> to get a list of the users in that same grid id.
>> >> >>
>> >> >> A second approach would be to have to column families -- one that
>> >> >> maps
>> >> >> a
>> >> >> Latitude to a list of users who are at that latitude and a second
>> >> >> that
>> >> >> maps
>> >> >> users who are at a particular longitude.  You could do the same
>> >> >> rectange
>> >> >> calculation above then do a get_slice range lookup to get a list
of
>> >> >> users
>> >> >> from range of latitude and a second list from the range of
>> >> >> longitudes.
>> >> >> You would then need to do a in-memory nested loop to find the list
>> >> >> of
>> >> >> users
>> >> >> that are in both lists.  This second approach could cause some
>> >> >> trouble
>> >> >> depending on where you search and how many users you really have
--
>> >> >> some
>> >> >> latitudes and longitudes have many many people in them
>> >> >>
>> >> >> So, it seems some version of a chunking / grid id thing would be
the
>> >> >> better approach.   If you let people zoom in or zoom out - you
could
>> >> >> just
>> >> >> have different column families for each level of zoom.
>> >> >>
>> >> >>
>> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >>
>> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> {
>> >> >> Decimal f = 0.0M;
>> >> >> lat = Math.Abs(lat);
>> >> >> f = 68.99M;
>> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M;
}
>> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M;
}
>> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M;
}
>> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M;
}
>> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M;
}
>> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M;
}
>> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M;
}
>> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M;
}
>> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >>
>> >> >> return f;
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> Math.Abs(Math.Cos((Double)
>> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >>
>> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >>
>> >> >> 2010/4/9 Onur AKTAS <onur.aktas@live.com>:
>> >> >> > ...
>> >> >> > I'm trying to find out how do you perform queries with
>> >> >> > calculations
>> >> >> > on
>> >> >> > the
>> >> >> > fly without inserting the data as calculated from the beginning.
>> >> >> > Lets say we have latitude and longitude coordinates of all
users
>> >> >> > and
>> >> >> > we
>> >> >> > have
>> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >>
>> >> >> I'm not an expert, but I think that it boils down to "MapReduce"
and
>> >> >> "Hadoop".
>> >> >>
>> >> >> I don't think that there's any top-down tutorial on those two words,
>> >> >> you'll have to research yourself starting here:
>> >> >>
>> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >>
>> >> >>  * http://hadoop.apache.org/
>> >> >>
>> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >>
>> >> >> I don't think it is all documented in any one place yet...
>> >> >>
>> >> >>  Paul Prescod
>> >> >>
>> >> >
>> >> >
>> >
>> >
>
>

 

 


Mime
View raw message