incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ed Anuff ...@anuff.com>
Subject Re: Homebrew CF-indexing vs secondary indexing
Date Fri, 25 Feb 2011 19:18:54 GMT
It's nice to see some testing in this regard, however, it's worth pointing
out something that gets lost in CF index vs secondary index discussions.
What you're really proving is that get_slice (across columns) is faster than
get_indexed_slices (across keys).  For up to a certain size (and it would be
nice if there were some emperical testing to determine what that size is),
get_slice should be one of the most performant operations Cassandra can do.
CF index approaches are basically all about getting your data into a
situation where you can use get_slice to quickly perform the search.  The
reasons for using Cassandra's built in secondary index support, IMHO, is
that (1) it's easy to use whereas CF indexes are managed by the client  and
(2) there's concern about how large an index you'd be able to effectively
store in a CF index row.  The first point is more about Cassandra being
easier for newcomers, the latter point is something I'd like to see some
more data around.  Maybe you want to run your tests up to much larger sizes
and see if there's a point where the results change?  FWIW, I recently
switched back to CF-based indexes from secondary indexes, largely for the
flexibility in the types of queries that became possible, but it's nice to
see there's some performance benefit.  The other thing would be good to look
at is timing the overhead of what it takes to update your index as you
change the values that are being indexed.



On Fri, Feb 25, 2011 at 10:23 AM, Ron Siemens <rsiemens@greatergood.com>wrote:

>
> I updated the cassandra version in the hector package from 7.0 to 7.2.  The
> occasional slow-down in the CF-index went away.  I then upped the heap to
> 512MB, and the secondary-indexing then works.  Seems awfully memory hungry
> for my small dataset.  Even the CF-index was faster with more heap.  These
> are the times with Cassandra-0.7.2 and 512M heap.  Slightly different
> testing: I'm varying the index used which give different data size results.
>  It still surprises me that the CF index does substantially better.
>
> Secondary Index
>
> DEBUG Retrieved THS / 7293 rows, in 1051 ms
> DEBUG Retrieved TRS / 7289 rows, in 1448 ms
> DEBUG Retrieved BCS / 7788 rows, in 1553 ms
> DEBUG Retrieved ARS / 7426 rows, in 1479 ms
> DEBUG Retrieved CHS / 7290 rows, in 1575 ms
> DEBUG Retrieved MS / 4523 rows, in 766 ms
> DEBUG Retrieved PRS / 562 rows, in 40 ms
> DEBUG Retrieved GGF / 1162 rows, in 122 ms
> DEBUG Retrieved VET / 7313 rows, in 1193 ms
> DEBUG Retrieved AUT / 7287 rows, in 1746 ms
> DEBUG Retrieved LIT / 7291 rows, in 1331 ms
>
> CF Index
>
> DEBUG Retrieved THS / 7293 rows, in 17 + 759 ms
> DEBUG Retrieved TRS / 7289 rows, in 19 + 734 ms
> DEBUG Retrieved BCS / 7788 rows, in 23 + 736 ms
> DEBUG Retrieved ARS / 7426 rows, in 23 + 1448 ms
> DEBUG Retrieved CHS / 7290 rows, in 18 + 638 ms
> DEBUG Retrieved MS / 4523 rows, in 32 + 622 ms
> DEBUG Retrieved PRS / 562 rows, in 2 + 50 ms
> DEBUG Retrieved GGF / 1162 rows, in 3 + 79 ms
> DEBUG Retrieved VET / 7313 rows, in 17 + 686 ms
> DEBUG Retrieved AUT / 7287 rows, in 17 + 758 ms
> DEBUG Retrieved LIT / 7291 rows, in 17 + 745 ms
>
> On Feb 24, 2011, at 3:39 PM, Ron Siemens wrote:
>
> >
> > I failed to mention: this is just doing repeated data retrievals using
> the index.
> >
> >> ...
> >>
> >> Sample run: Secondary index.
> >>
> >> DEBUG Retrieved THS / 7293 rows, in 2012 ms
> >> DEBUG Retrieved THS / 7293 rows, in 1956 ms
> >> DEBUG Retrieved THS / 7293 rows, in 1843 ms
> > ...
> >
>
>

Mime
View raw message