cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ondřej Černoš <cern...@gmail.com>
Subject Re: various Cassandra performance problems when CQL3 is really used
Date Wed, 15 Jan 2014 09:49:53 GMT
Hi,

by the way, some of the issues are summarised here:
https://issues.apache.org/jira/browse/CASSANDRA-6586 and here:
https://issues.apache.org/jira/browse/CASSANDRA-6587.

regards,

ondrej cernos


On Tue, Jan 14, 2014 at 9:48 PM, Ondřej Černoš <cernoso@gmail.com> wrote:

> Hi,
>
> thanks for the answer and sorry for the delay. Let me answer inline.
>
>
> On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton <aaron@thelastpickle.com>wrote:
>
>> > * select id from table where token(id) > token(some_value) and
>> secondary_index = other_val limit 2 allow filtering;
>> >
>> > Filtering absolutely kills the performance. On a table populated with
>> 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of
>> JVM heap) and secondary index built on column with low cardinality of its
>> value set this query takes 156 seconds to finish.
>> Yes, this is why you have to add allow_filtering. You are asking the
>> nodes to read all the data that matches and filter in memory, that’s a SQL
>> type operation.
>>
>> Your example query is somewhat complex and I doubt it could get decent
>> performance, what does the query plan look like?
>>
>
> I don't know. How do I find out? The only mention about query plan in
> Cassandra I found is your article on your site, from 2011 and considering
> version 0.8.
>
> The example query gets computed in a fraction of the time if I perform
> just the fetch of all rows matching the token function and perform the
> filtering client side.
>
>
>
>> IMHO you need to do further de-normalisation, you will get the best
>> performance when you select rows by their full or part primary key.
>
>
> I denormalize all the way I can. The problem is I need to support paging
> and filtering at the same time. The API I must support allows filtering by
> example and paging - so how should I denormalize? Should I somehow manage
> pages of primary row keys manually? Or should I have manual secondary index
> and page somehow in the denormalized wide row?
>
> The trouble goes even further, even this doesn't perform well:
>
> select id from table where token(id) > token(some_value) and pk_cluster =
> 'val' limit N;
>
> where id and pk_cluster are primary key (CQL3 table). I guess this should
> be ordered row query and ordered column slice query, so where is the
> problem with performance?
>
>
>
>> > By the way, the performance is order of magnitude better if this patch
>> is applied:
>> That looks like it’s tuned to your specific need, it would ignore the max
>> results included in the query
>
>
> It is tuned, it only demonstrates the heuristics doesn't work well.
>
>
>> > * select id from table;
>> >
>> > As we saw in the trace log, the query - although it queries just row
>> ids - scans all columns of all the rows and (probably) compares TTL with
>> current time (?) (we saw hundreds of thousands of gettimeofday(2)). This
>> means that if the table somehow mixes wide and narrow rows, the performance
>> suffers horribly.
>> Select all rows from a table requires a range scan, which reads all rows
>> from all nodes. It should never be used production.
>>
>
> The trouble is I just need to perform it, sometimes. I know what the
> problem with the query is, but I have just a couple of thousands records -
> 150.000 - the datasets can all be stored in memory, SSTables can be fully
> mmapped. There is no reason for this query to be slow in this case.
>
>
>> Not sure what you mean by “scans all columns from all rows” a select by
>> column name will use a SliceByNamesReadCommand which will only read the
>> required columns from each SSTable (it normally short circuits though and
>> read from less).
>>
>
> The query should fetch only IDs, it checks TTLs of columns though. That is
> the point. Why does it do it?
>
>
>> if there is a TTL the ExpiringColumn.localExpirationTime must be checked,
>> if there is no TTL it will no be checked.
>
>
> It is a standard CQL3 table with ID, couple of columns and a CQL3
> collection. I didn't do anything with TTL on the table and it's columns.
>
>
>> > As Cassandra checks all the columns in selects, performance suffers
>> badly if the collection is of any interesting size.
>> This is not true, could you provide an example where you think this is
>> happening ?
>
>
> We saw it in the trace log. It happened in the select ID from table query.
> The table had a collection column.
>
>
>> > Additionally, we saw various random irreproducible freezes, high CPU
>> consumption when nothing happens (even with trace log level set no activity
>> was reported) and highly inpredictable performance characteristics after
>> nodetool flush and/or major compaction.
>> What was the HW platform and what was the load ?
>>
>
> My I7/8GB notebook, single node cluster, and virtualised AWS like
> environment, on nodes of various sizes.
>
>
>> Typically freezes in the server correlate to JVM GC, the JVM GC can also
>> be using the CPU.
>> If you have wide rows or make large reads you may run into more JVM GC
>> issues.
>>
>
>
>> nodetool flush will (as it says) flush all the tables to disk, if you
>> have a lot tables and/or a lot of secondary indexes this can cause the
>> switch lock to be held preventing write threads from progressing. Once
>> flush threads stop waiting on the flush queue the lock will be released.
>> See the help for memtable_flush_queue_size in the yaml file.
>>
>
> I will check this, thanks.
>
>
>> major compaction is not recommended to be used in production. If you are
>> seeing it cause performance problems I would guess it is related to JVM GC
>> and/or the disk IO is not able to keep up. When used it creates a single
>> SSTable for each table which will not be compacted again until (default) 3
>> other large SSTables are created or you run major compaction again. For
>> this reason it is not recommended.
>>
>> > Conclusions:
>> >
>> > - do not use collections
>> > - do not use secondary indexes
>> > - do not use filtering
>> > - have your rows as narrow as possible if you need any kind of all row
>> keys traversal
>> These features all have a use, but it looks like you leaned on them
>> heavily while creating a relational model. Specially the filtering, you
>> have to explicitly enable  it to prevent the client sending queries that
>> will take a long time.
>> The only time row key traversal is used normally is reading data through
>> hadoop. You should always strive to read row(s) from a table by  the full
>> or partial primary key.
>
>
>
>>  > With these conclusions in mind, CQL seems redundant, plain old thrift
>> may be used, joins should be done client side and/or all indexes need to be
>> handled manually. Correct?
>> No.
>> CQL provide a set of functionality not present in the thrift API.
>> Joins and indexes should generally be handled by denormlaising the data
>> during writes.
>>
>> It sounds like your data model was too relational, you need to
>> denormalise and read rows by primary key. Secondary indexes are useful when
>> you have a query pattern that is used infrequently.
>>
>
> regards,
>
> ondrej cernos
>
>
>> Hope that helps.
>>
>> -----------------
>> Aaron Morton
>> New Zealand
>> @aaronmorton
>>
>> Co-Founder & Principal Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On 18/12/2013, at 3:47 am, Ondřej Černoš <cernoso@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > we are reimplementing a legacy interface of an inventory-like service
>> (currently built on top of mysql) on Cassandra and I thought I would share
>> some findings with the list. The interface semantics is given and cannot be
>> changed. We chose Cassandra due to its multiple datacenter capabilities and
>> no-spof qualities. The dataset is small (6 tables having 150.000 records, a
>> bunch of tables with up to thousands of records), but the model is not
>> trivial - the mysql model has some 20+ tables, joins are frequent, m:n
>> relationships are frequent and the like. The interface is read heavy. We
>> thought the size of the dataset should allow the whole dataset to fit into
>> memory of each node (3 node cluster in each DC, 3 replicas, local quorum
>> operations) and that even though some operations (like secondary index
>> lookup) are not superfast, due to the size of the dataset it should perform
>> ok. We were wrong.
>> >
>> > We use CQL3 exclusively and we use all of its capabilities
>> (collections, secondary indexes, filtering), because they make the data
>> model maintainable. We denormalised what had to be denormalised in order to
>> avoid client side joins. Usual query to the storage means one CQL query on
>> a denormalised table. We need to support integer offset/limit paging,
>> filter-by-example kind of queries, M:N relationship queries and all the
>> usual suspects of old SQL-backed interface.
>> >
>> > This is the list of operations that perform really poorly we identified
>> so far. Row id is called id in the following:
>> >
>> > * select id from table where token(id) > token(some_value) and
>> secondary_index = other_val limit 2 allow filtering;
>> >
>> > Filtering absolutely kills the performance. On a table populated with
>> 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of
>> JVM heap) and secondary index built on column with low cardinality of its
>> value set this query takes 156 seconds to finish.
>> >
>> > By the way, the performance is order of magnitude better if this patch
>> is applied:
>> >
>> > diff --git
>> a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
>> b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
>> > index 5ab1df6..13af671 100644
>> > ---
>> a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
>> > +++
>> b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
>> > @@ -190,7 +190,8 @@ public class CompositesSearcher extends
>> SecondaryIndexSearcher
>> >
>> >             private int meanColumns =
>> Math.max(index.getIndexCfs().getMeanColumns(), 1);
>> >             // We shouldn't fetch only 1 row as this provides buggy
>> paging in case the first row doesn't satisfy all clauses
>> > -            private final int rowsPerQuery =
>> Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2);
>> > +//            private final int rowsPerQuery =
>> Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2);
>> > +            private final int rowsPerQuery = 100000;
>> >
>> >             public boolean needsFiltering()
>> >             {
>> >
>> > * select id from table;
>> >
>> > As we saw in the trace log, the query - although it queries just row
>> ids - scans all columns of all the rows and (probably) compares TTL with
>> current time (?) (we saw hundreds of thousands of gettimeofday(2)). This
>> means that if the table somehow mixes wide and narrow rows, the performance
>> suffers horribly.
>> >
>> > * CQL collections
>> >
>> > See the point above with mixing wide rows and narrow rows. As Cassandra
>> checks all the columns in selects, performance suffers badly if the
>> collection is of any interesting size.
>> >
>> > Additionally, we saw various random irreproducible freezes, high CPU
>> consumption when nothing happens (even with trace log level set no activity
>> was reported) and highly inpredictable performance characteristics after
>> nodetool flush and/or major compaction.
>> >
>> > Conclusions:
>> >
>> > - do not use collections
>> > - do not use secondary indexes
>> > - do not use filtering
>> > - have your rows as narrow as possible if you need any kind of all row
>> keys traversal
>> >
>> > With these conclusions in mind, CQL seems redundant, plain old thrift
>> may be used, joins should be done client side and/or all indexes need to be
>> handled manually. Correct?
>> >
>> > Thanks for reading,
>> >
>> > ondrej cernos
>>
>>
>

Mime
View raw message