cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-4705) Speculative execution for CL_ONE
Date Fri, 23 Nov 2012 18:28:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503260#comment-13503260
] 

Vijay edited comment on CASSANDRA-4705 at 11/23/12 6:28 PM:
------------------------------------------------------------

Hi Jonathan, Sorry for the delay.

{quote}
Would it make more sense to have getReadLatencyRate and UpdateSampleLatencies into SR? that
way we could replace case statements with polymorphism.
{quote}
The problem is that we have to calculate the expensive percentile calculation Async using
a scheduled TPE, We can avoid the switch by introducing additional SRFactory which will initialize
the TPE as per CF changes in the settings? Let me know.

{quote}
Why does preprocess return a boolean now?
{quote}
The current patch uses the boolean to understand if the processing was done or not.... its
used by RCB after the patch when there are more than 1 responses received by the co-ordinator
from the same host (When SR is on and the actual read response gets back at the same time
as the speculated response), we should not count that towards the consistency level.

{quote}
How does/should SR interact with RR? Using ALL + RRR
{quote}
Currently we are doing additional read to double check if we need to write, I thought the
goal for ALL will eliminate that and do additional write instead... Most cases it will be
a memtable update :)
I can think of 2 options:
1) Just document the ALL case and live with the additional writes, might not be a big issue
for most cases and for the rest user can switch to the default behavior.
2) We can queue the repair Mutations, in the Async thread we can check if there are duplicate
mutations pending... if yes then we can just ignore the duplicates this can be done by doing
sendRR and adding the CF to be repaired in a HashSet (it takes additional memory footprint).

Should we move this discussion to a different ticket?

Let me know, Thanks!
                
      was (Author: vijay2win@yahoo.com):
    Hi Jonathan, Sorry for the delay.

{quote}
Would it make more sense to have getReadLatencyRate and UpdateSampleLatencies into SR? that
way we could replace case statements with polymorphism.
{quote}
The problem is that we have to calculate the expensive percentile calculation Async using
a scheduled TPE, We can avoid the switch by introducing additional SRFactory which will initialize
the TPE as per CF changes in the settings? Let me know.

{quote}
Why does preprocess return a boolean now?
{quote}
The current patch uses the boolean to understand if the processing was done or not.... its
used by RCB after the patch when there are more than 1 responses received by the co-ordinator
from the same host (When SR is on and the actual read response gets back at the same time
as the speculated response), we should not count that towards the consistency level.

{quote}
How does/should SR interact with RR? Using ALL + RRR
{quote}
Currently we are doing additional read to double check if we need to write, I thought the
goal for ALL will eliminate that and do additional write instead... Most cases it will be
a memtable update :)
I can think of 2 options:
1) Just document the ALL case and live with the additional writes, user might not be a big
issue for most cases and for the rest they can switch to the default behavior.
2) We can queue the repair Mutations, in the Async thread we can check if there are duplicate
mutations pending... if yes then we can just ignore the duplicates this can be done by doing
sendRR and adding the CF to be repaired in a HashSet (it takes additional memory footprint).

Should we move this discussion to a different ticket?

Let me know, Thanks!
                  
> Speculative execution for CL_ONE
> --------------------------------
>
>                 Key: CASSANDRA-4705
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>         Attachments: 0001-CASSANDRA-4705.patch, 0001-CASSANDRA-4705-v2.patch
>
>
> When read_repair is not 1.0, we send the request to one node for some of the requests.
When a node goes down or when a node is too busy the client has to wait for the timeout before
it can retry. 
> It would be nice to watch for latency and execute an additional request to a different
node, if the response is not received within average/99% of the response times recorded in
the past.
> CASSANDRA-2540 might be able to solve the variance when read_repair is set to 1.0
> 1) May be we need to use metrics-core to record various Percentiles
> 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message