cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Andronov (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
Date Fri, 03 Feb 2012 11:43:54 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Philip Andronov updated CASSANDRA-3843:
---------------------------------------

    Description: 
During reading with Quorum level and replication factor greater then 2, Cassandra sends at
least one ReadRepair, even if there is no need to do that. 

With the fact that read requests await until ReadRepair will finish it slows down requsts
a lot, up to the Timeout :(

It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have
no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494
functionality, so my report without a patch.

Code explanations:
{code:title=RangeSliceResponseResolver.java|borderStyle=solid}
class RangeSliceResponseResolver {
    // ....
    private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
    {
    // ....

        protected Row getReduced()
        {
            ColumnFamily resolved = versions.size() > 1
                                  ? RowRepairResolver.resolveSuperset(versions)
                                  : versions.get(0);
            if (versions.size() < sources.size())
            {
                for (InetAddress source : sources)
                {
                    if (!versionSources.contains(source))
                    {
                          
                        // [PA] Here we are adding null ColumnFamily.
                        // later it will be compared with the "desired"
                        // version and will give us "fake" difference which
                        // forces Cassandra to send ReadRepair to a given source
                        versions.add(null);
                        versionSources.add(source);
                    }
                }
            }
            // ....
            if (resolved != null)
                repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key,
versions, versionSources));
            // ....
        }
    }
}
{code}


{code:title=RowRepairResolver.java|borderStyle=solid}
public class RowRepairResolver extends AbstractRowResolver {
    // ....
    public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table,
DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
    {
        List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());

        for (int i = 0; i < versions.size(); i++)
        {
            // Sooner or later we have to compare null and resolved which are obviously
            // not equals, so it will fire a ReadRequest, however it is not needed here
            ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
            if (diffCf == null)
                continue;
        // .... 
{code}

Imagine the following situation:
NodeA has X.1 // row X with the version 1
NodeB has X.2 
NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2

During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair,
so now nodes has the following content:
NodeA has X.12
NodeB has X.12

which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to
do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair)
or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC
to the consistent state

If you are reading from the Index then sooner or later you will get TimeOutException because
cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(

  was:
During reading with Quorum level and replication factor greater then 2, Cassandra sends at
least one ReadRepair, even if there is no need to do that. 

With the fact that read requests await until ReadRepair will finish it slows down requsts
a lot, up to the Timeout :(

It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have
no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494
functionality, so my report without a patch.

Code explanations:
class RangeSliceResponseResolver {
    // ....
    private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
    {
    // ....

        protected Row getReduced()
        {
            ColumnFamily resolved = versions.size() > 1
                                  ? RowRepairResolver.resolveSuperset(versions)
                                  : versions.get(0);
            if (versions.size() < sources.size())
            {
                for (InetAddress source : sources)
                {
                    if (!versionSources.contains(source))
                    {
                          
                        // [PA] Here we are adding null ColumnFamily.
                        // later it will be compared with the "desired"
                        // version and will give us "fake" difference which
                        // forces Cassandra to send ReadRepair to a given source
                        versions.add(null);
                        versionSources.add(source);
                    }
                }
            }
            // ....
            if (resolved != null)
                repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key,
versions, versionSources));
            // ....
        }
    }
}


2. public class RowRepairResolver extends AbstractRowResolver {
    // ....
    public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table,
DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
    {
        List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());

        for (int i = 0; i < versions.size(); i++)
        {
            // Sooner or later we have to compare null and resolved which are obviously
            // not equals, so it will fire a ReadRequest, however it is not needed here
            ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
            if (diffCf == null)
                continue;
        // .... 

Imagine the following situation:
NodeA has X.1 // row X with the version 1
NodeB has X.2 
NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2

During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair,
so now nodes has the following content:
NodeA has X.12
NodeB has X.12

which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to
do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair)
or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC
to the consistent state

If you are reading from the Index then sooner or later you will get TimeOutException because
cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(

    
> Unnecessary  ReadRepair request during RangeScan
> ------------------------------------------------
>
>                 Key: CASSANDRA-3843
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Philip Andronov
>            Priority: Critical
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends
at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows down requsts
a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I
have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494
functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
>     // ....
>     private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>,
Row>
>     {
>     // ....
>         protected Row getReduced()
>         {
>             ColumnFamily resolved = versions.size() > 1
>                                   ? RowRepairResolver.resolveSuperset(versions)
>                                   : versions.get(0);
>             if (versions.size() < sources.size())
>             {
>                 for (InetAddress source : sources)
>                 {
>                     if (!versionSources.contains(source))
>                     {
>                           
>                         // [PA] Here we are adding null ColumnFamily.
>                         // later it will be compared with the "desired"
>                         // version and will give us "fake" difference which
>                         // forces Cassandra to send ReadRepair to a given source
>                         versions.add(null);
>                         versionSources.add(source);
>                     }
>                 }
>             }
>             // ....
>             if (resolved != null)
>                 repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table,
key, versions, versionSources));
>             // ....
>         }
>     }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
>     // ....
>     public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String
table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress>
endpoints)
>     {
>         List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
>         for (int i = 0; i < versions.size(); i++)
>         {
>             // Sooner or later we have to compare null and resolved which are obviously
>             // not equals, so it will fire a ReadRequest, however it is not needed here
>             ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
>             if (diffCf == null)
>                 continue;
>         // .... 
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair,
so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need
to do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair)
or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC
to the consistent state
> If you are reading from the Index then sooner or later you will get TimeOutException
because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data
:(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message