cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vineet daniel <vineetdan...@gmail.com>
Subject Re: How to perform queries on Cassandra?
Date Sun, 11 Apr 2010 19:10:59 GMT
I assume that using the key i can get the all the columns like an array. Now
i'd be using php to extract  arraykey=>value in that array, just want to
avoid that i.e i can directly print the column names. If you guys think its
not a good idea I can drop it, anyways m new to it and a lot of things are
coming to mind. As far as cassandra and columnfamily/ super columns are
concerned i am pretty clear.

On Mon, Apr 12, 2010 at 12:23 AM, Benjamin Black <b@b3k.us> wrote:

> I have no idea what problem you are trying to solve.  You are
> misunderstanding a number of things about the Cassandra data model and
> about how we are explaining it is best used.
>
> On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel <vineetdaniel@gmail.com>
> wrote:
> > Well my initial idea is to use value  as column name, keeping key as an
> > incremental integer. The discussion after each mail has drifted from this
> > point which I had made. Will put it again.
> >
> > we want to store user information. We keep 1,2,3,4.....so on as keys. AND
> > values as column names i.e rather than using column name 'first name',
> i'd
> > be using 'vineet' as column name, rather than using 'last name' as column
> > name i'd be using 'daniel'. This way I can directly read column names as
> > values. This is just a thought that has come to my mind while trying to
> > design my db for cassandra.
> >
> >
> >
> > On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black <b@b3k.us> wrote:
> >>
> >> Row keys must be unique.  If your usernames are not unique and you
> >> want to be able to query on them, you either need to figure out a way
> >> to make them unique or treat the username rows themselves as indices,
> >> which refer to a set of actually unique identifiers for users.
> >>
> >> On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vineetdaniel@gmail.com
> >
> >> wrote:
> >> > its not a problem its a scenario, which we need to handle. And all I
> am
> >> > trying to do is to achieve what is not there with API i.e a workaroud.
> >> >
> >> > On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b@b3k.us> wrote:
> >> >>
> >> >> A system that permits multiple people to have the same username has
a
> >> >> serious problem.
> >> >>
> >> >> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <
> vineetdaniel@gmail.com>
> >> >> wrote:
> >> >> > How to handle same usernames. Otherwise seems fine to me.
> >> >> >
> >> >> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <sunht@dopsun.com>
wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> As far as I can see it, the Cassandra API currently supports
> >> >> >> criterias
> >> >> >> on:
> >> >> >>
> >> >> >> Token – Key – Super Column Name (if applicable) - Column
Names
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I guess Token is not usually used for the day to day queries,
so,
> >> >> >> Key
> >> >> >> and
> >> >> >> Column Names are normally used for querying. For the user
name and
> >> >> >> password
> >> >> >> case, I guess it can be done like this:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Define a CF as UserAuth with type as Super, and Key is user
name,
> >> >> >> while
> >> >> >> password can be the SuperKeyName. So, while you receive the
user
> >> >> >> name
> >> >> >> and
> >> >> >> password from the UI (or any other methods), it can be queried
> via:
> >> >> >> multiget_slice or get_range_slices, if there are anything
> returned,
> >> >> >> means
> >> >> >> that the user name and password matches.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> If not using the super column name, and put the password as
the
> >> >> >> column
> >> >> >> name, the column name usually not used for these kind of
> >> >> >> discretionary
> >> >> >> values (actually, I don’t see any definitive documents on
how to
> use
> >> >> >> the
> >> >> >> column Names and Super Columns, flexibility is the good of
> >> >> >> Cassandra,
> >> >> >> or is
> >> >> >> it bad if abused? :P)
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Not sure whether this is the best way, but I guess it will
work.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Regards,
> >> >> >>
> >> >> >> Dop
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
> >> >> >> Sent: Sunday, April 11, 2010 5:33 PM
> >> >> >> To: user@cassandra.apache.org
> >> >> >> Subject: Re: How to perform queries on Cassandra?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Hi Benjamin
> >> >> >>
> >> >> >> I'll try to make it more clear to you.
> >> >> >> We have a user table with fields 'id', 'username', and 'password'.
> >> >> >> Now
> >> >> >> if
> >> >> >> use the ideal way to store key/value, like :
> >> >> >> username : vineetdaniel
> >> >> >> timestamp
> >> >> >> password : <password>
> >> >> >> timestamp
> >> >> >>
> >> >> >> second user :
> >> >> >>
> >> >> >> username: <seconduser>
> >> >> >> timestamp
> >> >> >> password:<password>
> >> >> >>
> >> >> >> and so on, here what i assume is that as we cannot make search
on
> >> >> >> values
> >> >> >> (as confirmed by guys on cassandra forums) we are not able
to
> >> >> >> perform
> >> >> >> robust
> >> >> >> 'where' queries. Now what i propose is this.
> >> >> >>
> >> >> >> Rather than using a static values for column names use values
> itself
> >> >> >> and
> >> >> >> unique key as identifier. So, the above example when put in
as per
> >> >> >> me
> >> >> >> would
> >> >> >> be.
> >> >> >>
> >> >> >> vineetdaniel : vineetdaniel
> >> >> >> timestamp
> >> >> >>
> >> >> >> <password>:<password>
> >> >> >> timestamp
> >> >> >>
> >> >> >> second user
> >> >> >> seconduser:seconduser
> >> >> >> timestamp
> >> >> >>
> >> >> >> password:password
> >> >> >> timestamp
> >> >> >>
> >> >> >> By using above methodology we can simply make search on keys
> itself
> >> >> >> rather
> >> >> >> than going into using different CF's. But to add further,
this
> >> >> >> cannot
> >> >> >> be
> >> >> >> used for every situation. I am still exploring this, and soon
will
> >> >> >> be
> >> >> >> updating the group and my blog with information pertaining
to
> this.
> >> >> >> As
> >> >> >> cassandra is new, I think every idea or experience should
be
> shared
> >> >> >> with the
> >> >> >> community.
> >> >> >>
> >> >> >> I hope I example is clear this time. Should you have any queries
> >> >> >> feel
> >> >> >> free
> >> >> >> to revert.
> >> >> >>
> >> >> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b@b3k.us>
wrote:
> >> >> >>
> >> >> >> Sorry, I don't understand your example.
> >> >> >>
> >> >> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
> >> >> >> <vineetdaniel@gmail.com> wrote:
> >> >> >> > Benjamin I quite agree to you, but what in case of duplicate
> >> >> >> > usernames,
> >> >> >> > suppose if I am not using unique names as in email id's
. If we
> >> >> >> > have
> >> >> >> > duplicacy in usernames we cannot use it for key, so what
should
> be
> >> >> >> > the
> >> >> >> > solution. I think keeping incremental numeric id as key
and
> >> >> >> > keeping
> >> >> >> > the
> >> >> >> > name
> >> >> >> > and value same in the column family.
> >> >> >> >
> >> >> >> > Example :
> >> >> >> > User1 has password as 123456
> >> >> >> >
> >> >> >> > Cassandra structure :
> >> >> >> >
> >> >> >> > 1 as key
> >> >> >> >            user1 - column name
> >> >> >> >            value - user1
> >> >> >> >            123456 - column name
> >> >> >> >             value - 123456
> >> >> >> >
> >> >> >> > I m thinking of doing it this way for my applicaton,
this way i
> >> >> >> > can
> >> >> >> > run
> >> >> >> > different sorts of queries too. Any feedback on this
is welcome.
> >> >> >> >
> >> >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b@b3k.us>
> wrote:
> >> >> >> >>
> >> >> >> >> You would have a Column Family, not a column for
that; let's
> call
> >> >> >> >> it
> >> >> >> >> the Users CF.  You'd use username as the row key
and have a
> >> >> >> >> column
> >> >> >> >> called 'password'.  For your example query, you'd
retrieve row
> >> >> >> >> key
> >> >> >> >> 'usr2', column 'password'.  The general pattern is
that you
> >> >> >> >> create
> >> >> >> >> CFs
> >> >> >> >> to act as indices for each query you want to perform.
 There is
> >> >> >> >> no
> >> >> >> >> equivalent to a relational store to perform arbitrary
queries.
> >> >> >> >>  You
> >> >> >> >> must structure things to permit the queries of interest.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> b
> >> >> >> >>
> >> >> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <
> sikerasakti@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > I have already read the API spesification. Honestly
I do not
> >> >> >> >> > understand
> >> >> >> >> > how to use it. Because there are not an examples.
> >> >> >> >> >
> >> >> >> >> > For example I have a column like this:
> >> >> >> >> >
> >> >> >> >> > UserName    Password
> >> >> >> >> > usr1                abc
> >> >> >> >> > usr2                xyz
> >> >> >> >> > usr3                opm
> >> >> >> >> >
> >> >> >> >> > suppose I want query the user's password using
SQL in RDBMS
> >> >> >> >> >
> >> >> >> >> >       Select Password From Users Where UserName
= "usr2";
> >> >> >> >> >
> >> >> >> >> > Now I want to get the password using OODBMS
DB4o Object Query
> >> >> >> >> > and
> >> >> >> >> > Java
> >> >> >> >> >
> >> >> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
> >> >> >> >> >      {
> >> >> >> >> >             public boolean match(Users Myusers)
> >> >> >> >> >             {
> >> >> >> >> >                  return Myuser.getUserName()
== "usr2";
> >> >> >> >> >             }
> >> >> >> >> >      });
> >> >> >> >> >
> >> >> >> >> > After we get the Users instance in the QueryResult,
hence we
> >> >> >> >> > can
> >> >> >> >> > get
> >> >> >> >> > the
> >> >> >> >> > usr2's password.
> >> >> >> >> >
> >> >> >> >> > How we perform this query using Cassandra API
and Java??
> >> >> >> >> > Would you tell me please??  Thank You.
> >> >> >> >> >
> >> >> >> >> > Dir.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod
> >> >> >> >> > <paul@prescod.net>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> No. Cassandra has an API.
> >> >> >> >> >>
> >> >> >> >> >> http://wiki.apache.org/cassandra/API
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir
> >> >> >> >> >> <sikerasakti@gmail.com>
> >> >> >> >> >> wrote:
> >> >> >> >> >> > Does Cassandra has a default query
language such as SQL in
> >> >> >> >> >> > RDBMS
> >> >> >> >> >> > and Object Query in OODBMS?  Thank
you.
> >> >> >> >> >> >
> >> >> >> >> >> > Dir.
> >> >> >> >> >> >
> >> >> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> >> >> >> >> > <malsmith@treehousesystems.com>
> >> >> >> >> >> > wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> It's sort of an interesting problem
- in RDBMS one
> >> >> >> >> >> >> relatively
> >> >> >> >> >> >> simple
> >> >> >> >> >> >> approach would be calculate a rectangle
that is X km by Y
> >> >> >> >> >> >> km
> >> >> >> >> >> >> with
> >> >> >> >> >> >> User
> >> >> >> >> >> >> 1's
> >> >> >> >> >> >> location at the center.  So the
rectangle is UserX -
> 10KmX
> >> >> >> >> >> >> ,
> >> >> >> >> >> >> UserY-10KmY to
> >> >> >> >> >> >> UserX+10KmX , UserY+10KmY
> >> >> >> >> >> >>
> >> >> >> >> >> >> Then you could query the database
for all other users
> where
> >> >> >> >> >> >> that
> >> >> >> >> >> >> each
> >> >> >> >> >> >> user
> >> >> >> >> >> >> considered is curUserX > UserX-10Km
and curUserX <
> >> >> >> >> >> >> UserX+10KmX
> >> >> >> >> >> >> and
> >> >> >> >> >> >> curUserY
> >> >> >> >> >> >> > UserY-10KmY and curUserY <
UserY+10KmY
> >> >> >> >> >> >> * Not the 10KmX and 10KmY are really
a translation from
> >> >> >> >> >> >> Kilometers
> >> >> >> >> >> >> to
> >> >> >> >> >> >> degrees of  lat and longitude 
(that you can find on a
> >> >> >> >> >> >> google
> >> >> >> >> >> >> search)
> >> >> >> >> >> >>
> >> >> >> >> >> >> With the right indexes this query
actually runs pretty
> >> >> >> >> >> >> well.
> >> >> >> >> >> >>
> >> >> >> >> >> >> Translating that to Cassandra seems
a bit complex at
> first
> >> >> >> >> >> >> -
> >> >> >> >> >> >> but
> >> >> >> >> >> >> you
> >> >> >> >> >> >> could
> >> >> >> >> >> >> try something like pre-calculating
a grid with the right
> >> >> >> >> >> >> resolution
> >> >> >> >> >> >> (like a
> >> >> >> >> >> >> square of 5KM per side) and assign
every user to a
> >> >> >> >> >> >> particular
> >> >> >> >> >> >> grid
> >> >> >> >> >> >> ID.
> >> >> >> >> >> >> That
> >> >> >> >> >> >> way you just calculate with grid
ID User1 is in then do a
> >> >> >> >> >> >> direct
> >> >> >> >> >> >> key
> >> >> >> >> >> >> lookup
> >> >> >> >> >> >> to get a list of the users in that
same grid id.
> >> >> >> >> >> >>
> >> >> >> >> >> >> A second approach would be to have
to column families --
> >> >> >> >> >> >> one
> >> >> >> >> >> >> that
> >> >> >> >> >> >> maps
> >> >> >> >> >> >> a
> >> >> >> >> >> >> Latitude to a list of users who
are at that latitude and
> a
> >> >> >> >> >> >> second
> >> >> >> >> >> >> that
> >> >> >> >> >> >> maps
> >> >> >> >> >> >> users who are at a particular longitude.
 You could do
> the
> >> >> >> >> >> >> same
> >> >> >> >> >> >> rectange
> >> >> >> >> >> >> calculation above then do a get_slice
range lookup to get
> a
> >> >> >> >> >> >> list
> >> >> >> >> >> >> of
> >> >> >> >> >> >> users
> >> >> >> >> >> >> from range of latitude and a second
list from the range
> of
> >> >> >> >> >> >> longitudes.
> >> >> >> >> >> >> You would then need to do a in-memory
nested loop to find
> >> >> >> >> >> >> the
> >> >> >> >> >> >> list
> >> >> >> >> >> >> of
> >> >> >> >> >> >> users
> >> >> >> >> >> >> that are in both lists.  This second
approach could cause
> >> >> >> >> >> >> some
> >> >> >> >> >> >> trouble
> >> >> >> >> >> >> depending on where you search and
how many users you
> really
> >> >> >> >> >> >> have
> >> >> >> >> >> >> --
> >> >> >> >> >> >> some
> >> >> >> >> >> >> latitudes and longitudes have many
many people in them
> >> >> >> >> >> >>
> >> >> >> >> >> >> So, it seems some version of a
chunking / grid id thing
> >> >> >> >> >> >> would
> >> >> >> >> >> >> be
> >> >> >> >> >> >> the
> >> >> >> >> >> >> better approach.   If you let people
zoom in or zoom out
> -
> >> >> >> >> >> >> you
> >> >> >> >> >> >> could
> >> >> >> >> >> >> just
> >> >> >> >> >> >> have different column families
for each level of zoom.
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> I'm stuck on a stopped train so
-- here is even more
> code:
> >> >> >> >> >> >>
> >> >> >> >> >> >> static Decimal GetLatitudeMiles(Decimal
lat)
> >> >> >> >> >> >> {
> >> >> >> >> >> >> Decimal f = 0.0M;
> >> >> >> >> >> >> lat = Math.Abs(lat);
> >> >> >> >> >> >> f = 68.99M;
> >> >> >> >> >> >>          if (lat >= 0.0M &&
lat < 10.0M) { f = 68.71M; }
> >> >> >> >> >> >> else if (lat >= 10.0M &&
lat < 20.0M) { f = 68.73M; }
> >> >> >> >> >> >> else if (lat >= 20.0M &&
lat < 30.0M) { f = 68.79M; }
> >> >> >> >> >> >> else if (lat >= 30.0M &&
lat < 40.0M) { f = 68.88M; }
> >> >> >> >> >> >> else if (lat >= 40.0M &&
lat < 50.0M) { f = 68.99M; }
> >> >> >> >> >> >> else if (lat >= 50.0M &&
lat < 60.0M) { f = 69.12M; }
> >> >> >> >> >> >> else if (lat >= 60.0M &&
lat < 70.0M) { f = 69.23M; }
> >> >> >> >> >> >> else if (lat >= 70.0M &&
lat < 80.0M) { f = 69.32M; }
> >> >> >> >> >> >> else if (lat >= 80.0M) { f =
69.38M; }
> >> >> >> >> >> >>
> >> >> >> >> >> >> return f;
> >> >> >> >> >> >> }
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> Decimal MilesPerDegreeLatitude
=
> >> >> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
> >> >> >> >> >> >> Decimal MilesPerDegreeLongitude
= ((Decimal)
> >> >> >> >> >> >> Math.Abs(Math.Cos((Double)
> >> >> >> >> >> >> zList[0].Latitude))) * 24900.0M
/ 360.0M;
> >> >> >> >> >> >>                         dRadius
= 10.0M  // ten miles
> >> >> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >> >> >> >> >>
> >> >> >> >> >> >> ps.TopLatitude = zList[0].Latitude
- deltaLat;
> >> >> >> >> >> >> ps.TopLongitude = zList[0].Longitude
- deltaLong;
> >> >> >> >> >> >> ps.BottomLatitude = zList[0].Latitude
+ deltaLat;
> >> >> >> >> >> >> ps.BottomLongitude = zList[0].Longitude
+ deltaLong;
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700,
Paul Prescod wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> 2010/4/9 Onur AKTAS <onur.aktas@live.com>:
> >> >> >> >> >> >> > ...
> >> >> >> >> >> >> > I'm trying to find out how
do you perform queries with
> >> >> >> >> >> >> > calculations
> >> >> >> >> >> >> > on
> >> >> >> >> >> >> > the
> >> >> >> >> >> >> > fly without inserting the
data as calculated from the
> >> >> >> >> >> >> > beginning.
> >> >> >> >> >> >> > Lets say we have latitude
and longitude coordinates of
> >> >> >> >> >> >> > all
> >> >> >> >> >> >> > users
> >> >> >> >> >> >> > and
> >> >> >> >> >> >> > we
> >> >> >> >> >> >> > have
> >> >> >> >> >> >> >  Distance(from_lat, from_long,
to_lat, to_long)
> function
> >> >> >> >> >> >> > which
> >> >> >> >> >> >> > gives distance between lat/longs
pairs in kilometers.
> >> >> >> >> >> >>
> >> >> >> >> >> >> I'm not an expert, but I think
that it boils down to
> >> >> >> >> >> >> "MapReduce"
> >> >> >> >> >> >> and
> >> >> >> >> >> >> "Hadoop".
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think that there's any
top-down tutorial on those
> >> >> >> >> >> >> two
> >> >> >> >> >> >> words,
> >> >> >> >> >> >> you'll have to research yourself
starting here:
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://hadoop.apache.org/
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think it is all documented
in any one place
> yet...
> >> >> >> >> >> >>
> >> >> >> >> >> >>  Paul Prescod
> >> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >
> >> >
> >> >
> >
> >
>

Mime
View raw message