cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12884) Batch logic can lead to unbalanced use of system.batches
Date Thu, 10 Aug 2017 20:35:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122224#comment-16122224
] 

Jeff Jirsa edited comment on CASSANDRA-12884 at 8/10/17 8:34 PM:
-----------------------------------------------------------------

[~iamaleksey] will have a more comprehensive review, I'm sure, but a few notes from a very
cursory glance:

-1) I don't see the purpose of stubbing out {{BatchlogManager::shuffle}} as a helper function
here.- (You're overriding it for deterministic testing)

2) In the case where {{validated.keySet().size() == 1}} , shuffling all of the IPs in a given
rack may not be all that efficient - may be quicker to just pick 2 random ints, and grab the
IPs at those offsets (like we do for the case where we have more than 2 racks, {{result.add(rackMembers.get(getRandomInt(rackMembers.size())));}}
)




was (Author: jjirsa):
[~iamaleksey] will have a more comprehensive review, I'm sure, but a few notes from a very
cursory glance:

1) I don't see the purpose of stubbing out {{BatchlogManager::shuffle}} as a helper function
here.

2) In the case where {{validated.keySet().size() == 1}} , shuffling all of the IPs in a given
rack may not be all that efficient - may be quicker to just pick 2 random ints, and grab the
IPs at those offsets (like we do for the case where we have more than 2 racks, {{result.add(rackMembers.get(getRandomInt(rackMembers.size())));}}
)



> Batch logic can lead to unbalanced use of system.batches
> --------------------------------------------------------
>
>                 Key: CASSANDRA-12884
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12884
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Adam Hattrell
>            Assignee: Daniel Cranford
>             Fix For: 3.0.x, 3.11.x
>
>         Attachments: 0001-CASSANDRA-12884.patch
>
>
> It looks as though there are some odd edge cases in how we distribute the copies in system.batches.
> The main issue is in the filter method for org.apache.cassandra.batchlog.BatchlogManager
> {code:java}
>  if (validated.size() - validated.get(localRack).size() >= 2)
>  {
>         // we have enough endpoints in other racks
>         validated.removeAll(localRack);
>   }
>  if (validated.keySet().size() == 1)
>  {
>        // we have only 1 `other` rack
>        Collection otherRack = Iterables.getOnlyElement(validated.asMap().values());
>        
>         return Lists.newArrayList(Iterables.limit(otherRack, 2));
>  }
> {code}
> So with one or two racks we just return the first 2 entries in the list.  There's no
shuffle or randomisation here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message