cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mck SembWever (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
Date Tue, 28 Jun 2011 14:15:17 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056524#comment-13056524
] 

Mck SembWever commented on CASSANDRA-2388:
------------------------------------------

bq. It looks like there's a ton of effort put in to avoiding making sortByProximity work w/
non-local nodes
Because it's only when that local node is down that we actually need to sort...
When/if DynamicEndpointSnitch's limitation is fixed (and it can sort by non-local nodes) then
CassandraServer.java need not bypass it. But this won't simplify the code in CFRR. Now that
CFIF supports multiple initialAddresses the method sortEndpointsByProximity(..) in CFIF can
be rewritten (ie any connection to any initialAddress is all we need, no need to mess around
with trying to connect through replica's to find information about replicas...)
bq. Wait, why do we even care? "local node" IS the right host to sort against
No. "initialAddress" is the right node to sort against. And it should be "local node". And
then we don't care about the replica.
But when "initialAddress" is down, then we randomly connect to another c* node so to find
out of the replica we know about which are 1) up, 2) closest, and 3) in the same dc. Then
it is a random c* node that becomes the "local node" and the call needs to be {{snitch.sortByProximity(initialAddress,
addresses)}}.
But yes... the CFRR code is contorted. In many ways i prefer the simplicity of the first patch
(both in api and in implementation) despite it not being "as correct". i thought of this "fallback
to replica" as a last resort to keep the m/r job running, rather than an actively used feature
where DynamicEndpointSnitch's scores will maximise performance. But then i'm only thinking
in terms of a small c* cluster and i certainly am naive about what performance gains these
scores can give...

> ColumnFamilyRecordReader fails for a given split because a host is down, even if records
could reasonably be read from other replica.
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.7.6, 0.8.0
>            Reporter: Eldon Stegall
>            Assignee: Jeremy Hanna
>              Labels: hadoop, inputformat
>             Fix For: 0.7.7, 0.8.2
>
>         Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch,
CASSANDRA-2388.patch
>
>
> ColumnFamilyRecordReader only tries the first location for a given split. We should try
multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message