cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
Date Wed, 14 Sep 2011 02:26:09 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104179#comment-13104179
] 

T Jake Luciani commented on CASSANDRA-2388:
-------------------------------------------

I just want to confirm what this ticket is about.

The JT has a list of endpoints for a given split.
When a task runs it may or may not be on one of those nodes 
If other tasks are running on all those replicas the JT may put them on a remote node.

So we need to decide which endpoint to connect to given the chance that nodes are down.

1. Check if the node running CFRR is one of the replicas (we have this) this means JT has
assigned a data-local task (good)
2. If none of these nodes are local then pick another.
3. If connection fails try the one other nodes.
4. Try to avoid endpoints in a different DC.

The biggest problem is 4.  Maybe the way todo this is change getSplits logic to never return
replicas in another DC.  I think this would require adding DC info to the describe_ring call.
 Then we only need to worry about 1-3.








> ColumnFamilyRecordReader fails for a given split because a host is down, even if records
could reasonably be read from other replica.
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.6
>            Reporter: Eldon Stegall
>            Assignee: Mck SembWever
>              Labels: hadoop, inputformat
>             Fix For: 0.8.6
>
>         Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch,
CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch,
CASSANDRA-2388.patch
>
>
> ColumnFamilyRecordReader only tries the first location for a given split. We should try
multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message