Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 10572 invoked from network); 17 Sep 2009 22:38:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Sep 2009 22:38:26 -0000 Received: (qmail 95598 invoked by uid 500); 17 Sep 2009 22:38:26 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 95537 invoked by uid 500); 17 Sep 2009 22:38:26 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 95527 invoked by uid 99); 17 Sep 2009 22:38:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Sep 2009 22:38:25 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Sep 2009 22:38:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 87ADB234C1E9 for ; Thu, 17 Sep 2009 15:37:57 -0700 (PDT) Message-ID: <1306085278.1253227077554.JavaMail.jira@brutus> Date: Thu, 17 Sep 2009 15:37:57 -0700 (PDT) From: "Hong Tang (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1000) JobHistory.initDone() should retain the try ... catch in the body In-Reply-To: <660774914.1253226597853.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756826#action_12756826 ] Hong Tang commented on MAPREDUCE-1000: -------------------------------------- MAPREDUCE-157 changed JobHistory.initDone() and removed the try...catch clause of the body. The try...catch body is necessary because otherwise, if an IOE is thrown during the execution, JT would be aborted. I observed it when testing MAPREDUCE-728. Symptom: {noformat} org.apache.hadoop.fs.ChecksumException: Checksum error: file:/Users/htang/Documents/Work/workspace/hadoop-mapreduce/build/hadoop-mapred-0.21.0-dev/logs/history/job_200904211745_0010_geek5 at 523264 at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:221) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:72) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:45) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:97) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:220) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:143) at org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(LocalFileSystem.java:55) at org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:1203) at org.apache.hadoop.mapreduce.jobhistory.JobHistory.moveToDoneNow(JobHistory.java:338) at org.apache.hadoop.mapreduce.jobhistory.JobHistory.moveOldFiles(JobHistory.java:372) at org.apache.hadoop.mapreduce.jobhistory.JobHistory.initDone(JobHistory.java:145) at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:3900) at org.apache.hadoop.mapred.SimulatorJobTracker.(SimulatorJobTracker.java:80) {noformat} The previous run of the JT was killed, which leaves the job history file mismatching with CRC checksum. The selected patch segment that shows the removal of the try...catch clause: Before MAPREDUCE-157 {noformat} - static boolean initDone(JobConf conf, FileSystem fs){ - try { - //if completed job history location is set, use that - String doneLocation = conf. - get("mapred.job.tracker.history.completed.location"); - if (doneLocation != null) { - DONE = fs.makeQualified(new Path(doneLocation)); - DONEDIR_FS = fs; - } else { - DONE = new Path(LOG_DIR, "done"); - DONEDIR_FS = LOGDIR_FS; - } - - //If not already present create the done folder with appropriate - //permission - if (!DONEDIR_FS.exists(DONE)) { - LOG.info("Creating DONE folder at "+ DONE); - if (! DONEDIR_FS.mkdirs(DONE, - new FsPermission(HISTORY_DIR_PERMISSION))) { - throw new IOException("Mkdirs failed to create " + DONE.toString()); - } - } - - fileManager.start(); - //move the log files remaining from last run to the DONE folder - //suffix the file name based on Jobtracker identifier so that history - //files with same job id don't get over written in case of recovery. - FileStatus[] files = LOGDIR_FS.listStatus(new Path(LOG_DIR)); - String jtIdentifier = fileManager.jobTracker.getTrackerIdentifier(); - String fileSuffix = "." + jtIdentifier + OLD_SUFFIX; - for (FileStatus fileStatus : files) { - Path fromPath = fileStatus.getPath(); - if (fromPath.equals(DONE)) { //DONE can be a subfolder of log dir - continue; - } - LOG.info("Moving log file from last run: " + fromPath); - Path toPath = new Path(DONE, fromPath.getName() + fileSuffix); - fileManager.moveToDoneNow(fromPath, toPath); - } - } catch(IOException e) { - LOG.error("Failed to initialize JobHistory log file", e); - disableHistory = true; - } - return !(disableHistory); - } {noformat} After MAPREDUCE-157 {noformat} + /** Initialize the done directory and start the history cleaner thread */ + public void initDone(JobConf conf, FileSystem fs) throws IOException { + //if completed job history location is set, use that + String doneLocation = + conf.get("mapred.job.tracker.history.completed.location"); + if (doneLocation != null) { + done = fs.makeQualified(new Path(doneLocation)); + doneDirFs = fs; + } else { + done = logDirFs.makeQualified(new Path(logDir, "done")); + doneDirFs = logDirFs; + } + + //If not already present create the done folder with appropriate + //permission + if (!doneDirFs.exists(done)) { + LOG.info("Creating DONE folder at "+ done); + if (! doneDirFs.mkdirs(done, + new FsPermission(HISTORY_DIR_PERMISSION))) { + throw new IOException("Mkdirs failed to create " + done.toString()); + } + } + LOG.info("Inited the done directory to " + done.toString()); + + moveOldFiles(); + startFileMoverThreads(); + + // Start the History Cleaner Thread + long maxAgeOfHistoryFiles = conf.getLong( + "mapreduce.cluster.jobhistory.maxage", DEFAULT_HISTORY_MAX_AGE); + historyCleanerThread = new HistoryCleaner(maxAgeOfHistoryFiles); + historyCleanerThread.start(); + } {noformat} > JobHistory.initDone() should retain the try ... catch in the body > ----------------------------------------------------------------- > > Key: MAPREDUCE-1000 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1000 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Hong Tang > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.