cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Francois Im (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2904) get_range_slices with no columns could be made faster by scanning the index file
Date Sat, 16 Jul 2011 19:29:59 GMT


Jean-Francois Im commented on CASSANDRA-2904:

I forgot to mention that I am interested in writing a patch for this; I implemented something
quick and dirty on my end to get an idea of the performance improvement, but it assumes that
there is nothing else going on at the same moment (ie. nobody else is writing, consistency
level is always ONE, no compaction or anything else is going on, there's only one client doing
this kind of query, etc.).

Writing something more general purpose would be trickier and I would probably need some pointers
for some things(how to handle a compaction, query cursors and a consistency level other than
ONE, mostly), but it sounds really fun. Is there any interest for this?

> get_range_slices with no columns could be made faster by scanning the index file
> --------------------------------------------------------------------------------
>                 Key: CASSANDRA-2904
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Jean-Francois Im
> When scanning a column family using get_range_slices() and a predicate that contains
no columns, the scan operates on the actual data, not the index file.
> Our use case for this is that we have a column family that has relatively wide rows(varying
from 10kb to over 100kb of data per row) and we need to do iterate through all the keys to
figure out which rows we are interested in; obviously, going through the index file than the
data is faster in this case(in the order of minutes versus hours).

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message