hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Rabkin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4805) Remove black list feature from Chukwa Agent to Chukwa Collector communication
Date Wed, 10 Dec 2008 04:52:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655094#action_12655094
] 

Ari Rabkin commented on HADOOP-4805:
------------------------------------

I always preferred detecting and correcting collector congestion on the collector side. We
might also have collectors start spitting back HTTP errors if they cross some specified load
average.  
+1 to patch.

> Remove black list feature from Chukwa Agent to Chukwa Collector communication
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4805
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4805
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: Redhat EL 5, Java 6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>         Attachments: HADOOP-4805.patch
>
>
> Recently, new load balance algorithm was added to improve chukwa agent to chukwa collector
communication.  The design was to send one HTTP POST per collector, and rotate through the
list of collector to load balance the collectors.  When a collector fail to respond, the collector
is black listed for 5 minutes.  If all collectors are not responding, sleep for random 1-5
minutes.  Unfortunately, this algorithm produced problem for slower machines.  The slower
machines end up black list all collectors and sleep indefinitely.  This ticket is to restore
the algorithm to the original design.  The agent will shuffle the collector list. The agent
will try it's best effort to make HTTP POST to the same collector until error occurs, then
it will iterate through the list of random collectors.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message