incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul van Hoven <paul.van.ho...@googlemail.com>
Subject Re: CQL : Request did not complete within rpc_timeout
Date Sun, 03 Feb 2013 20:42:12 GMT
Thanks for the answer. Can anybody else answer my other two questions,
because my problem is not solved yet?

2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
> This was the issue that prompted the "WITH FILTERING ALLOWED":
>
> https://issues.apache.org/jira/browse/CASSANDRA-4915
>
> Cassandra's storage system can only optimize certain queries.
>
> On Sun, Feb 3, 2013 at 2:07 PM, Paul van Hoven
> <paul.van.hoven@googlemail.com> wrote:
>> I'm not sure if I understood your answer.
>>
>>> When you have GB or TB of data any query that adds "WITH FILTERING"
>>> will not work at scale.
>> 1. You mean any query that requires "with filtering" is slow?
>>
>>> Secondary indexes need at least one equality. If you want to do this
>>> at scale you might need a different design.
>> 2. And what design would be recommendable then?
>>
>> 3. How should the query look like such that it would scale?
>>
>>
>>
>> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>>> Secondary indexes need at least one equality. If you want to do this
>>> at scale you might need a different design.
>>>
>>> Using WITH FILTERING and LIMIT 10 is simply grabbing the first few
>>> random rows that match your criteria.
>>>
>>> When you have GB or TB of data any query that adds "WITH FILTERING"
>>> will not work at scale.
>>>
>>> This is why it was added to the language CQL lets you do some queries
>>> that "seem fast" when your developing with 10 rows, without this
>>> clause you would not know if a query is fast because it hits a
>>> cassandra index, or it is just fast because the results were found in
>>> the first 10 rows.
>>>
>>> Edward
>>>
>>> On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven
>>> <paul.van.hoven@googlemail.com> wrote:
>>>> Okay, here is the schema (actually it is in german, but I translated
>>>> the column names such that it is easier to read for an international
>>>> audience):
>>>>
>>>> cqlsh:demodb> describe table offerten_log_archiv;
>>>>
>>>> CREATE TABLE offerten_log_archiv (
>>>>   offerte_id int PRIMARY KEY,
>>>>   aktionen int,
>>>>   angezeigt bigint,
>>>>   datum timestamp,
>>>>   gutschrift bigint,
>>>>   kampagne_id int,
>>>>   klicks int,
>>>>   klicks_ungueltig int,
>>>>   kosten bigint,
>>>>   statistik_id bigint,
>>>>   stunden int,
>>>>   werbeflaeche_id int,
>>>>   werbemittel_id int
>>>> ) WITH
>>>>   bloom_filter_fp_chance=0.010000 AND
>>>>   caching='KEYS_ONLY' AND
>>>>   comment='' AND
>>>>   dclocal_read_repair_chance=0.000000 AND
>>>>   gc_grace_seconds=864000 AND
>>>>   read_repair_chance=0.100000 AND
>>>>   replicate_on_write='true' AND
>>>>   compaction={'class': 'SizeTieredCompactionStrategy'};
>>>>
>>>> CREATE INDEX datum_key ON offerten_log_archiv (datum);
>>>>
>>>> CREATE INDEX stunden_key ON offerten_log_archiv (stunden);
>>>>
>>>> cqlsh:demodb>
>>>>
>>>> This is the query I'm trying to perform:
>>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour
= 0
>>>> limit 10 allow filtering;
>>>> Request did not complete within rpc_timeout.
>>>>
>>>> ola = offerten_log_archiv (table name)
>>>> hour = stunde (column name)
>>>> date = datum (column name)
>>>>
>>>> I hope this information makes my problem more clear.
>>>>
>>>>
>>>>
>>>> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>>>>> Without seeing your schema it is hard to say, but in some cases "ALLOW
>>>>> FILTERING" might be considered "EXPECT THIS COULD BE SLOW". It could
>>>>> mean the query is not hitting and index and is going to page through
>>>>> large amounts of data.
>>>>>
>>>>> On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven
>>>>> <paul.van.hoven@googlemail.com> wrote:
>>>>>> After figuring out how to use the ">" operator on an secondary
index I
>>>>>> noticed that in a column family of about 5.5 million datasets I get
a
>>>>>> rpc_timeout when trying to read data from this table. In the concrete
>>>>>> situation I want to request data younger than January 1 2013. The
>>>>>> number of rows that should be affected are about 1 million. When
doing
>>>>>> the request I get a timeout error:
>>>>>>
>>>>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and
hour = 0
>>>>>> limit 10 allow filtering;
>>>>>> Request did not complete within rpc_timeout.
>>>>>>
>>>>>> Actually I find this very confusing since I would except an
>>>>>> exceptional performance gain in comparison to a similar sql query.
>>>>>> Therefore, I think the query I'm performing is not appropriate for
>>>>>> cassandra, although I would do a query like that in this manner on
a
>>>>>> sql database. So my question now is: How should I perfrom this query
>>>>>> on cassandra?

Mime
View raw message