cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: Internal pagination in secondary index queries
Date Mon, 29 Dec 2014 23:45:18 GMT
Secondary indexes are there for convenience, not performance.  If you're
looking for something performant, you'll need to maintain your own indexes.


On Mon Dec 29 2014 at 3:22:58 PM Sam Klock <sklock@akamai.com> wrote:

> Hi folks,
>
> Perhaps this is a question better addressed to the Cassandra developers
> directly, but I thought I'd ask it here first.  We've recently been
> benchmarking certain uses of secondary indexes in Cassandra 2.1.x, and
> we've noticed that when the number of items in an index reaches beyond
> some threshold (perhaps several tens of thousands depending on the
> cardinality) performance begins to degrade substantially.  This is
> particularly the case when the client does things it probably shouldn't
> do (like manually paginate results), but we suspect there's at least
> one issue in Cassandra having an impact here that we'd like to
> understand better.
>
> Our investigation led us to logic in Cassandra used to paginate scans
> of rows in indexes on composites.  The issue seems to be the short
> algorithm Cassandra uses to select the size of the pages for the scan,
> partially given on the following two lines (from
> o.a.c.db.index.composites.CompositesSearcher):
>
> private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(),
> 1);
> private int rowsPerQuery = Math.max(Math.min(filter.maxRows(),
> filter.maxColumns() / meanColumns), 2);
>
> The value computed for rowsPerQuery appears to be the page size.
>
> Based on our reading of the code, unless the value obtained for
> meanColumns is very small, a large query-level page size is used, or
> the DISTINCT keyword is used, the value for (filter.maxColumns() /
> meanColumns) always ends up being small enough that the page size is
> 2.  This seems to be the case both for very low-cardinality indexes
> (two different indexed values) and for indexes with higher
> cardinalities as long as the number of entries per index row is more
> than a few thousand.
>
> The fact that we consistently get such a small page size appears to
> have a substantial impact on performance.  The overhead is simply
> devastating, especially since it looks like the pages are likely to
> overlap with each other (the last element of one page is the first
> element of the next).  To wit: if we fix the index page size in code to
> a very large number, index queries in our environment that prior
> required over two minutes to complete can finish in under ten seconds.
>
> Some (but probably not this much) overhead might be acceptable if the
> algorithm is intended to achieve other worthy goals (safety?).  But
> what's puzzling to us is that we can't figure out what it's intended to
> do.  We suspect the algorithm is simply buggy, but we'd like insight
> from knowledgeable parties before we draw that conclusion and try to
> find a different solution.
>
> Does anyone here have relevant experience with secondary indexes that
> might shed light on the design choice here?  In particular, can anyone
> (perhaps the developers?) explain what this algorithm is intended to do
> and what we might do to safely get around this limitation?
>
> Also (to the developers watching this list): is this the sort of
> question we should be addressing to the dev list directly?
>
> Thanks,
> SK
>

Mime
View raw message