chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sourygna Luangsay (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CHUKWA-593) Archive daemon: infinite loop at midnight
Date Tue, 28 Jun 2011 21:47:28 GMT
Archive daemon: infinite loop at midnight
-----------------------------------------

                 Key: CHUKWA-593
                 URL: https://issues.apache.org/jira/browse/CHUKWA-593
             Project: Chukwa
          Issue Type: Bug
          Components: MR Data Processors
    Affects Versions: 0.4.0
         Environment: Debian 5.0, Hadoop 0.20
            Reporter: Sourygna Luangsay
            Priority: Minor
             Fix For: 0.5.0, 0.4.0


The archive manager Chukwa daemon enters an infinite loop between 24H to 1H. This entails
an increase of the namenode load and a huge increase of both chukwa and namenode logs.

Problem seems to come from the start function of ChukwaArchiveManager.java (in package org/apache/hadoop/chukwa/extraction/archive).
At midnight, we get two directories in /chukwa/dataSinkArchives/ (one for the last day and
one for the new day). This means that we neither enter the "daysInRawArchiveDir.length ==
0" condition nor the "daysInRawArchiveDir.length == 1" one. processDay function is then called
but few is done due to "modificationDate < oneHourAgo" condition.
Finally, we loop without having slept or deleted last day directory. Such process repeats
itself during one hour.

Here is how I propose to change the "daysInRawArchiveDir.length == 1" condition block in the
start function:
148         if (daysInRawArchiveDir.length >= 1 ) {
149           long nextRun = lastRun + (2*ONE_HOUR) - (1*60*1000);// 2h -1min
150           if (now < nextRun) {
151             log.info("lastRun < 2 hours so skip archive for now, going to sleep for
30 minutes, currentDate is:" + new java.util.Date());
152             Thread.sleep(30 * 60 * 1000);
153             continue;
154           }
155         }

As for me, it removed the infinite loop problem. But maybe there is a reason to separate "1
directory" case from "many directories" case. I've been reading documentation and subversion
but could not find it.
If there is one, could someone explain it to me?

Regards.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message