incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: CQL : Request did not complete within rpc_timeout
Date Mon, 04 Feb 2013 01:00:57 GMT
On Mon 04 Feb 2013 04:42:12 AM CST, Paul van Hoven wrote:
> Thanks for the answer. Can anybody else answer my other two questions,
> because my problem is not solved yet?
>
> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>> This was the issue that prompted the "WITH FILTERING ALLOWED":
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-4915
>>
>> Cassandra's storage system can only optimize certain queries.
>>
>> On Sun, Feb 3, 2013 at 2:07 PM, Paul van Hoven
>> <paul.van.hoven@googlemail.com> wrote:
>>> I'm not sure if I understood your answer.
>>>
>>>> When you have GB or TB of data any query that adds "WITH FILTERING"
>>>> will not work at scale.
>>> 1. You mean any query that requires "with filtering" is slow?
>>>
>>>> Secondary indexes need at least one equality. If you want to do this
>>>> at scale you might need a different design.
>>> 2. And what design would be recommendable then?
>>>
>>> 3. How should the query look like such that it would scale?
>>>
>>>
>>>
>>> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>>>> Secondary indexes need at least one equality. If you want to do this
>>>> at scale you might need a different design.
>>>>
>>>> Using WITH FILTERING and LIMIT 10 is simply grabbing the first few
>>>> random rows that match your criteria.
>>>>
>>>> When you have GB or TB of data any query that adds "WITH FILTERING"
>>>> will not work at scale.
>>>>
>>>> This is why it was added to the language CQL lets you do some queries
>>>> that "seem fast" when your developing with 10 rows, without this
>>>> clause you would not know if a query is fast because it hits a
>>>> cassandra index, or it is just fast because the results were found in
>>>> the first 10 rows.
>>>>
>>>> Edward
>>>>
>>>> On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven
>>>> <paul.van.hoven@googlemail.com> wrote:
>>>>> Okay, here is the schema (actually it is in german, but I translated
>>>>> the column names such that it is easier to read for an international
>>>>> audience):
>>>>>
>>>>> cqlsh:demodb> describe table offerten_log_archiv;
>>>>>
>>>>> CREATE TABLE offerten_log_archiv (
>>>>>    offerte_id int PRIMARY KEY,
>>>>>    aktionen int,
>>>>>    angezeigt bigint,
>>>>>    datum timestamp,
>>>>>    gutschrift bigint,
>>>>>    kampagne_id int,
>>>>>    klicks int,
>>>>>    klicks_ungueltig int,
>>>>>    kosten bigint,
>>>>>    statistik_id bigint,
>>>>>    stunden int,
>>>>>    werbeflaeche_id int,
>>>>>    werbemittel_id int
>>>>> ) WITH
>>>>>    bloom_filter_fp_chance=0.010000 AND
>>>>>    caching='KEYS_ONLY' AND
>>>>>    comment='' AND
>>>>>    dclocal_read_repair_chance=0.000000 AND
>>>>>    gc_grace_seconds=864000 AND
>>>>>    read_repair_chance=0.100000 AND
>>>>>    replicate_on_write='true' AND
>>>>>    compaction={'class': 'SizeTieredCompactionStrategy'};
>>>>>
>>>>> CREATE INDEX datum_key ON offerten_log_archiv (datum);
>>>>>
>>>>> CREATE INDEX stunden_key ON offerten_log_archiv (stunden);
>>>>>
>>>>> cqlsh:demodb>
>>>>>
>>>>> This is the query I'm trying to perform:
>>>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour
= 0
>>>>> limit 10 allow filtering;
>>>>> Request did not complete within rpc_timeout.
>>>>>
>>>>> ola = offerten_log_archiv (table name)
>>>>> hour = stunde (column name)
>>>>> date = datum (column name)
>>>>>
>>>>> I hope this information makes my problem more clear.
>>>>>
>>>>>
>>>>>
>>>>> 2013/2/3 Edward Capriolo <edlinuxguru@gmail.com>:
>>>>>> Without seeing your schema it is hard to say, but in some cases "ALLOW
>>>>>> FILTERING" might be considered "EXPECT THIS COULD BE SLOW". It could
>>>>>> mean the query is not hitting and index and is going to page through
>>>>>> large amounts of data.
>>>>>>
>>>>>> On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven
>>>>>> <paul.van.hoven@googlemail.com> wrote:
>>>>>>> After figuring out how to use the ">" operator on an secondary
index I
>>>>>>> noticed that in a column family of about 5.5 million datasets
I get a
>>>>>>> rpc_timeout when trying to read data from this table. In the
concrete
>>>>>>> situation I want to request data younger than January 1 2013.
The
>>>>>>> number of rows that should be affected are about 1 million. When
doing
>>>>>>> the request I get a timeout error:
>>>>>>>
>>>>>>> cqlsh:demodb> select * from ola where date > '2013-01-01'
and hour = 0
>>>>>>> limit 10 allow filtering;
>>>>>>> Request did not complete within rpc_timeout.
>>>>>>>
>>>>>>> Actually I find this very confusing since I would except an
>>>>>>> exceptional performance gain in comparison to a similar sql query.
>>>>>>> Therefore, I think the query I'm performing is not appropriate
for
>>>>>>> cassandra, although I would do a query like that in this manner
on a
>>>>>>> sql database. So my question now is: How should I perfrom this
query
>>>>>>> on cassandra?

according to your query, maybe this will be better

CREATE TABLE offerten_log_archiv (
  offerte_id int,
  aktionen int,
  angezeigt bigint,
  datum timestamp,
  gutschrift bigint,
  kampagne_id int,
  klicks int,
  klicks_ungueltig int,
  kosten bigint,
  statistik_id bigint,
  stunden int,
  werbeflaeche_id int,
  werbemittel_id int
  PRIMARY KEY(offerte_id, datum)
) WITH
...

Mime
View raw message