incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Index/Count/Order by syntax
Date Thu, 29 Jul 2010 22:20:07 GMT

Yes, but as I said it may not be the optimal design. You may end up with a single row very
big row.

- you could use multiple rows, each holding a range of counts.

- you could use a standard CF and store the count in the row key, then use get_range_slices.
Using the random partition you will need to sort them yourself, if you use the Order preserving
Partitioner they will be sorted for you.
e.g. {
 SearchLogs:
  999 : {word1:word1}
  998 : {word2 : word2}
}

get_range_slices over the RandomPartioner has some performance issues when compared to OrderPerservingPartioner.
But I think the feature returns the same data, just out of order Try some experiments and
see what happens.

Do you want to read back a portion of the index (e.g. words with 800 to 900 occurrences) or
the entire index ?
Aaron


On 30 Jul, 2010,at 10:04 AM, Mark <static.void.dev@gmail.com> wrote:

> Ok so basically an "array" of words grouped by their count?
>
> Something like this?
>
> {
> SearchLogs : {
> ALL : {
> 999: { word1:word1, word2:word2, word3:word3 }
> 998: { word1:word1, word2:word2, word3:word3 }
> }
> }
> }
>
> On 7/29/10 2:50 PM, Aaron Morton wrote:
> > One method would be to use a Super Column Family. Have one row, in
> > that create a column family for each count value you have, and then in
> > the super column create a column for each word.
> >
> > Set the CompareWith for the super col to be LongType and the
> > CompareSubcolumnsWith to be AsciiTyoe or UTFType.
> >
> > You could then use get_slice to read super columns in that row.
> >
> > This may not be the most efficient model, it will depend how how much
> > data you have and what your read patterns are like. Also be remember
> > that pre 0.7 you cannot atomically increment counters in cassandra.
> >
> > Have a play and see what works for you.
> >
> > Aaron
> >
> > On 29 Jul, 2010,at 02:36 PM, Mark <static.void.dev@gmail.com> wrote:
> >
> >> I know there is no native support for "order by", "group by" etc but I
> >> was wondering how it could be accomplished with some custom indexes?
> >>
> >> For example, say I have a list of word counts like (notice 2 words have
> >> the same count):
> >>
> >> "cassandra" => 100
> >> "foo" => 999
> >> "bar" => 1
> >> "baz" => 500
> >> "fooz" => 999
> >>
> >> How can I store then retrieve these words ordered by their count/values?
> >>
> >> Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message