cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Hutchison <hutch-li...@recursive.ca>
Subject Re: How do you construct an index and use it, especially in Ruby
Date Wed, 28 Apr 2010 00:00:47 GMT


embedded response, way down below...

On 2010-04-26, at 12:56 PM, Ryan King wrote:

> On Sun, Apr 25, 2010 at 11:14 AM, Bob Hutchison
> <hutch-lists@recursive.ca> wrote:
>> 
>> Hi,
>> 
>> I'm new to Cassandra and trying to work out how to do something that I've implemented
any number of times (e.g. TokyoCabinet, Perst, even the filesystem using grep :-) I've managed
to get some of this working in Cassandra but not all.
>> 
>> So here's the core of the situation.
>> 
>> I have this opaque chunk of data that I want to store in Cassandra and then find
it again.
>> 
>> I can generate a key when the data is created very easily, and I've stored it in
a straight forward manner: in a column with a key whose value is the data. And I can retrieve
it when I know the key. No difficulties here at all, works fine.
>> 
>> Now I want to index this data taking what I imagine to be a pretty typical approach.
>> 
>> Lets say there's two many-to-one indexes: 'colour', and 'size'. Each colour value
will have more than one chunk of data, same for size.
>> 
>> What I thought I'd do is make a super column and index the chunk of data kind of
like: { 'colour' => { 'blue' => 1 }, 'size' => { 'large' => 1}} with the key equal
to the key of the chunk of data. And Cassandra stores it without error like that. So using
the Ruby gem, it'd be something along the lines of:
>> 
>> cassandra.insert(:Indexes, key-of-the-chunk-of-data, { 'colour' => { 'blue' =>
1 }, 'size' => { 'large' => 1 } })
>> 
>> Q1: is this a reasonable approach? It *seems* to be what I've read is supposed to
be done. The 1 is meaningless. Anyway, it executes without error in Ruby.
> 
> No. In order to index your data, you need to invert it. Since you're
> working in ruby I'd recommend CassandraObject:
> http://github.com/nzKoz/cassandra_object. It has indexing built in.

Thanks Ryan. I don't really want to add a lot of layers of abstraction here, since what I'm
writing is itself an abstraction. Worse, I can't get cassandra_object to install, some kind
of gem issue. Anyway...

I dusted off my 20-years-ago experience with python (i.e. with the help of google), downloaded
and installed pycassa (and thrift itself) and played around a bit. I find that the following
python/pycassa snippet works just fine (or well enough).

import pycassa

client = pycassa.connect()
indexes_scf = pycassa.ColumnFamily(client, 'Play', 'Indexes', super=True)
rows = list(indexes_scf.get_range(column_start='blue', column_finish='blue', super_column='colour'))

The data was inserted using Ruby, but not read, because, as I said (below now), I don't know
how to write the equivalent to the indexes_scf.get_range call in the snippet. So a simpler
question, how do you write the equivalent to that in ruby and using the cassandra gem?

Cheers,
Bob

> 
> -ryan
> 
>> Q2: what is the syntax of the (Ruby) query to find the keys of all 'blue' chunks
of data? I'm assuming get_range is the correct method, but what are the parameters? The docs
say: get_range(column_family, options={}) but that seems to be missing a bit of detail, in
particular the super column name.
>> 
>> Q2a: So I know there's a :start and :finish key supported in the options hash, inclusive,
exclusive respectively. How do you define a range for equals with a UTF8 key? Surely not 'blue'.succ??
or by some kind of suffix??
>> 
>> Q2b: How do you specify the super column name 'colour'? Looking at the (Ruby) source
of the get_range method and I'm unconvinced that this is implemented (seems to be a constant
'' used where the super column name makes sense to be.)
>> 
>> Anyway I ended up hacking at the Ruby gem's source to use the column name where the
'' was in the original, and didn't really get anywhere useful (I can find nothing, or everything,
nothing in between).
>> 
>> Q3: If I am correct about what is supposed to be done, does the Ruby gem support
it?
>> 
>> Q4: Does anyone know of some Ruby code that does and indexed lookup that they could
point me at. (lots of code that indexes but nothing that searches by the index)
>> 
>> I'll try to take a look at some of the other Cassandra client implementations and
see if I can get this model to work. Maybe just a Ruby problem?? With any luck, it'll be me
messing up.
>> 
>> If it'd help I can post the source of what I have, but it'll need some cleanup. Let
me know.
>> 
>> Thanks for taking the time to read this far :-)
>> 
>> Bob
>> 
>> ----
>> Bob Hutchison
>> Recursive Design Inc.
>> http://www.recursive.ca/
>> weblog: http://xampl.com/so
>> 
>> 
>> ----
>> Bob Hutchison
>> Recursive Design Inc.
>> http://www.recursive.ca/
>> weblog: http://xampl.com/so
>> 
>> 
>> 
>> 
>> 

----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so





Mime
View raw message