falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peeyush Bishnoi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1165) Falcon restart failed, if defined service in cluster entity is unreachable
Date Tue, 21 Apr 2015 14:37:59 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505041#comment-14505041

Peeyush Bishnoi commented on FALCON-1165:

On analysis, I have found that this issue happen when Falcon reload cluster entities on restart
and try to ensure that jar files in HDFS working lib directory should be up to date. But if
HDFS service is not available on remote cluster (down due to maintenance activity), Falcon
fail to restart on source cluster and log the exception.

Approach to solve this issue is that, when Falcon restart we should make a check whether remote
HDFS service (on cluster Z) is available or not upon reloading cluster entities. If it is
not available, then we should not try to update the jars file in HDFS working lib directory.
But ensure that Falcon service should start on source cluster (cluster X). With this atleast
replication/processing should happen with another available remote cluster(cluster Y) from
source cluster. Please provide more thoughts on this approach. 

> Falcon restart failed, if defined service in cluster entity is unreachable
> --------------------------------------------------------------------------
>                 Key: FALCON-1165
>                 URL: https://issues.apache.org/jira/browse/FALCON-1165
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Peeyush Bishnoi
>            Assignee: Peeyush Bishnoi
>             Fix For: 0.7
> Falcon fail to restart, if any service in the cluster entity is not reachable or down.
> For example, if there are clusters X, Y, Z. In cluster X, submit cluster entities which
points to services of cluster Y & Z. Execute some replication jobs from cluster X to Y
and even to cluster Z as well. If after certain duration, cluster Z HDFS service is down due
to maintenance activity and at the same time we require to restart Falcon service on cluster
X due to some reason, then Falcon will fail to restart on cluster X. 
> This issue has been reported internally at Hortonworks.

This message was sent by Atlassian JIRA

View raw message