incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: CQL : Request did not complete within rpc_timeout
Date Sun, 03 Feb 2013 19:22:08 GMT
This was the issue that prompted the "WITH FILTERING ALLOWED":

https://issues.apache.org/jira/browse/CASSANDRA-4915

Cassandra's storage system can only optimize certain queries.

On Sun, Feb 3, 2013 at 2:07 PM, Paul van Hoven
<paul.van.hoven@googlemail.com> wrote:
> I'm not sure if I understood your answer.
>
>> When you have GB or TB of data any query that adds "WITH FILTERING"
>> will not work at scale.
> 1. You mean any query that requires "with filtering" is slow?
>
>> Secondary indexes need at least one equality. If you want to do this
>> at scale you might need a different design.
> 2. And what design would be recommendable then?
>
> 3. How should the query look like such that it would scale?
>
>
>
> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>> Secondary indexes need at least one equality. If you want to do this
>> at scale you might need a different design.
>>
>> Using WITH FILTERING and LIMIT 10 is simply grabbing the first few
>> random rows that match your criteria.
>>
>> When you have GB or TB of data any query that adds "WITH FILTERING"
>> will not work at scale.
>>
>> This is why it was added to the language CQL lets you do some queries
>> that "seem fast" when your developing with 10 rows, without this
>> clause you would not know if a query is fast because it hits a
>> cassandra index, or it is just fast because the results were found in
>> the first 10 rows.
>>
>> Edward
>>
>> On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven
>> <paul.van.hoven@googlemail.com> wrote:
>>> Okay, here is the schema (actually it is in german, but I translated
>>> the column names such that it is easier to read for an international
>>> audience):
>>>
>>> cqlsh:demodb> describe table offerten_log_archiv;
>>>
>>> CREATE TABLE offerten_log_archiv (
>>>   offerte_id int PRIMARY KEY,
>>>   aktionen int,
>>>   angezeigt bigint,
>>>   datum timestamp,
>>>   gutschrift bigint,
>>>   kampagne_id int,
>>>   klicks int,
>>>   klicks_ungueltig int,
>>>   kosten bigint,
>>>   statistik_id bigint,
>>>   stunden int,
>>>   werbeflaeche_id int,
>>>   werbemittel_id int
>>> ) WITH
>>>   bloom_filter_fp_chance=0.010000 AND
>>>   caching='KEYS_ONLY' AND
>>>   comment='' AND
>>>   dclocal_read_repair_chance=0.000000 AND
>>>   gc_grace_seconds=864000 AND
>>>   read_repair_chance=0.100000 AND
>>>   replicate_on_write='true' AND
>>>   compaction={'class': 'SizeTieredCompactionStrategy'};
>>>
>>> CREATE INDEX datum_key ON offerten_log_archiv (datum);
>>>
>>> CREATE INDEX stunden_key ON offerten_log_archiv (stunden);
>>>
>>> cqlsh:demodb>
>>>
>>> This is the query I'm trying to perform:
>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0
>>> limit 10 allow filtering;
>>> Request did not complete within rpc_timeout.
>>>
>>> ola = offerten_log_archiv (table name)
>>> hour = stunde (column name)
>>> date = datum (column name)
>>>
>>> I hope this information makes my problem more clear.
>>>
>>>
>>>
>>> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>>>> Without seeing your schema it is hard to say, but in some cases "ALLOW
>>>> FILTERING" might be considered "EXPECT THIS COULD BE SLOW". It could
>>>> mean the query is not hitting and index and is going to page through
>>>> large amounts of data.
>>>>
>>>> On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven
>>>> <paul.van.hoven@googlemail.com> wrote:
>>>>> After figuring out how to use the ">" operator on an secondary index
I
>>>>> noticed that in a column family of about 5.5 million datasets I get a
>>>>> rpc_timeout when trying to read data from this table. In the concrete
>>>>> situation I want to request data younger than January 1 2013. The
>>>>> number of rows that should be affected are about 1 million. When doing
>>>>> the request I get a timeout error:
>>>>>
>>>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour
= 0
>>>>> limit 10 allow filtering;
>>>>> Request did not complete within rpc_timeout.
>>>>>
>>>>> Actually I find this very confusing since I would except an
>>>>> exceptional performance gain in comparison to a similar sql query.
>>>>> Therefore, I think the query I'm performing is not appropriate for
>>>>> cassandra, although I would do a query like that in this manner on a
>>>>> sql database. So my question now is: How should I perfrom this query
>>>>> on cassandra?

Mime
View raw message