incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <r...@twitter.com>
Subject Re: How do you construct an index and use it, especially in Ruby
Date Mon, 26 Apr 2010 16:56:28 GMT
On Sun, Apr 25, 2010 at 11:14 AM, Bob Hutchison
<hutch-lists@recursive.ca> wrote:
>
> Hi,
>
> I'm new to Cassandra and trying to work out how to do something that I've implemented
any number of times (e.g. TokyoCabinet, Perst, even the filesystem using grep :-) I've managed
to get some of this working in Cassandra but not all.
>
> So here's the core of the situation.
>
> I have this opaque chunk of data that I want to store in Cassandra and then find it again.
>
> I can generate a key when the data is created very easily, and I've stored it in a straight
forward manner: in a column with a key whose value is the data. And I can retrieve it when
I know the key. No difficulties here at all, works fine.
>
> Now I want to index this data taking what I imagine to be a pretty typical approach.
>
> Lets say there's two many-to-one indexes: 'colour', and 'size'. Each colour value will
have more than one chunk of data, same for size.
>
> What I thought I'd do is make a super column and index the chunk of data kind of like:
{ 'colour' => { 'blue' => 1 }, 'size' => { 'large' => 1}} with the key equal to
the key of the chunk of data. And Cassandra stores it without error like that. So using the
Ruby gem, it'd be something along the lines of:
>
>  cassandra.insert(:Indexes, key-of-the-chunk-of-data, { 'colour' => { 'blue' =>
1 }, 'size' => { 'large' => 1 } })
>
> Q1: is this a reasonable approach? It *seems* to be what I've read is supposed to be
done. The 1 is meaningless. Anyway, it executes without error in Ruby.

No. In order to index your data, you need to invert it. Since you're
working in ruby I'd recommend CassandraObject:
http://github.com/nzKoz/cassandra_object. It has indexing built in.

-ryan

> Q2: what is the syntax of the (Ruby) query to find the keys of all 'blue' chunks of data?
I'm assuming get_range is the correct method, but what are the parameters? The docs say: get_range(column_family,
options={}) but that seems to be missing a bit of detail, in particular the super column name.
>
> Q2a: So I know there's a :start and :finish key supported in the options hash, inclusive,
exclusive respectively. How do you define a range for equals with a UTF8 key? Surely not 'blue'.succ??
or by some kind of suffix??
>
> Q2b: How do you specify the super column name 'colour'? Looking at the (Ruby) source
of the get_range method and I'm unconvinced that this is implemented (seems to be a constant
'' used where the super column name makes sense to be.)
>
> Anyway I ended up hacking at the Ruby gem's source to use the column name where the ''
was in the original, and didn't really get anywhere useful (I can find nothing, or everything,
nothing in between).
>
> Q3: If I am correct about what is supposed to be done, does the Ruby gem support it?
>
> Q4: Does anyone know of some Ruby code that does and indexed lookup that they could point
me at. (lots of code that indexes but nothing that searches by the index)
>
> I'll try to take a look at some of the other Cassandra client implementations and see
if I can get this model to work. Maybe just a Ruby problem?? With any luck, it'll be me messing
up.
>
> If it'd help I can post the source of what I have, but it'll need some cleanup. Let me
know.
>
> Thanks for taking the time to read this far :-)
>
> Bob
>
> ----
> Bob Hutchison
> Recursive Design Inc.
> http://www.recursive.ca/
> weblog: http://xampl.com/so
>
>
> ----
> Bob Hutchison
> Recursive Design Inc.
> http://www.recursive.ca/
> weblog: http://xampl.com/so
>
>
>
>
>

Mime
View raw message