incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: CQL : Request did not complete within rpc_timeout
Date Sun, 03 Feb 2013 16:08:52 GMT
Secondary indexes need at least one equality. If you want to do this
at scale you might need a different design.

Using WITH FILTERING and LIMIT 10 is simply grabbing the first few
random rows that match your criteria.

When you have GB or TB of data any query that adds "WITH FILTERING"
will not work at scale.

This is why it was added to the language CQL lets you do some queries
that "seem fast" when your developing with 10 rows, without this
clause you would not know if a query is fast because it hits a
cassandra index, or it is just fast because the results were found in
the first 10 rows.

Edward

On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven
<paul.van.hoven@googlemail.com> wrote:
> Okay, here is the schema (actually it is in german, but I translated
> the column names such that it is easier to read for an international
> audience):
>
> cqlsh:demodb> describe table offerten_log_archiv;
>
> CREATE TABLE offerten_log_archiv (
>   offerte_id int PRIMARY KEY,
>   aktionen int,
>   angezeigt bigint,
>   datum timestamp,
>   gutschrift bigint,
>   kampagne_id int,
>   klicks int,
>   klicks_ungueltig int,
>   kosten bigint,
>   statistik_id bigint,
>   stunden int,
>   werbeflaeche_id int,
>   werbemittel_id int
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'};
>
> CREATE INDEX datum_key ON offerten_log_archiv (datum);
>
> CREATE INDEX stunden_key ON offerten_log_archiv (stunden);
>
> cqlsh:demodb>
>
> This is the query I'm trying to perform:
> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0
> limit 10 allow filtering;
> Request did not complete within rpc_timeout.
>
> ola = offerten_log_archiv (table name)
> hour = stunde (column name)
> date = datum (column name)
>
> I hope this information makes my problem more clear.
>
>
>
> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>> Without seeing your schema it is hard to say, but in some cases "ALLOW
>> FILTERING" might be considered "EXPECT THIS COULD BE SLOW". It could
>> mean the query is not hitting and index and is going to page through
>> large amounts of data.
>>
>> On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven
>> <paul.van.hoven@googlemail.com> wrote:
>>> After figuring out how to use the ">" operator on an secondary index I
>>> noticed that in a column family of about 5.5 million datasets I get a
>>> rpc_timeout when trying to read data from this table. In the concrete
>>> situation I want to request data younger than January 1 2013. The
>>> number of rows that should be affected are about 1 million. When doing
>>> the request I get a timeout error:
>>>
>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0
>>> limit 10 allow filtering;
>>> Request did not complete within rpc_timeout.
>>>
>>> Actually I find this very confusing since I would except an
>>> exceptional performance gain in comparison to a similar sql query.
>>> Therefore, I think the query I'm performing is not appropriate for
>>> cassandra, although I would do a query like that in this manner on a
>>> sql database. So my question now is: How should I perfrom this query
>>> on cassandra?

Mime
View raw message