cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shenghua Wan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure
Date Mon, 28 Sep 2015 20:21:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933906#comment-14933906
] 

Shenghua Wan edited comment on CASSANDRA-10347 at 9/28/15 8:20 PM:
-------------------------------------------------------------------

First thank you for looking into this issue.

[~pauloricardomg] To you question, I have not tried mapreduce.output.bulkoutputformat.maxfailedhosts
property. I have read the source code and thought this property only gave up when certain
number of host connections failed. However, I still want the streaming continue if there exist
some hosts alive, even exceeding the threshold. Therefore, to solve the problem of my use
case (skip connecting to lost hosts), I have implemented something just like "mapreduce.output.bulkoutputformat.ignorehosts"
property. You can find my implementation in the attached source code AbstractBulkRecordWriter.java
Line 75 and Line 164-169, thanks to the existing API 
SSTableLoader.StreamResultFuture stream(Set<InetAddress> toIgnore, StreamEventHandler...
listeners).

I intended to submit a patch via git but I am not able to do so for some reason. That's the
maximum I can do at the moment.  




was (Author: wanshenghua):
First thank you for looking into this issue.

[~pauloricardomg] To you question, I have not tried mapreduce.output.bulkoutputformat.maxfailedhosts
property. I have read the source code and thought this property only gave up when certain
number of host connections failed. However, I still want the streaming continue if there exist
some hosts alive, even exceeding the threshold. Therefore, to solve the problem of my use
case (skip connecting to lost hosts), I have implemented something just like "mapreduce.output.bulkoutputformat.ignorehosts"
property, e.g.


> Bulk Loader API could not tolerate even node failure
> ----------------------------------------------------
>
>                 Key: CASSANDRA-10347
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10347
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Shenghua Wan
>            Assignee: Paulo Motta
>             Fix For: 2.1.x, 2.2.x, 3.0.x
>
>         Attachments: AbstractBulkRecordWriter.java
>
>
> When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in the token
range, which includes the dead nodes. Therefore, the stream failed. There was a design in
C* API to allow stream() method to have a list of ignore hosts, but it was not utilized.
> The empty-argument stream() method is called in all existing versions of C*, i.e.
> in v2.0.11, https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> in v2.1.5, https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> and current trunk branch https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message