Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71199DD5D for ; Tue, 8 Jan 2013 13:04:13 +0000 (UTC) Received: (qmail 12836 invoked by uid 500); 8 Jan 2013 13:04:12 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 12792 invoked by uid 500); 8 Jan 2013 13:04:12 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 12782 invoked by uid 99); 8 Jan 2013 13:04:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2013 13:04:12 +0000 Date: Tue, 8 Jan 2013 13:04:12 +0000 (UTC) From: "Hadoop QA (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9184) Some reducers failing to write final output file to s3. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9184?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D135= 46849#comment-13546849 ]=20 Hadoop QA commented on HADOOP-9184: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest a= ttachment=20 http://issues.apache.org/jira/secure/attachment/12563745/HADOOP-9184-bran= ch-0.20.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patc= h. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2012//= console This message is automatically generated. =20 > Some reducers failing to write final output file to s3. > ------------------------------------------------------- > > Key: HADOOP-9184 > URL: https://issues.apache.org/jira/browse/HADOOP-9184 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 0.20.2 > Reporter: Jeremy Karn > Attachments: example.pig, HADOOP-9184-branch-0.20.patch, hadoop-9= 184.patch, task_log.txt > > > We had a Hadoop job that was running 100 reducers with most of the reduce= rs expected to write out an empty file. When the final output was to an S3 = bucket we were finding that sometimes we were missing a final part file. T= his was happening approximately 1 job in 3 (so approximately 1 reducer out = of 300 was failing to output the data properly). I've attached the pig scri= pt we were using to reproduce the bug. > After an in depth look and instrumenting the code we traced the problem t= o moveTaskOutputs in FileOutputCommitter. =20 > The code there looked like: > {code} > if (fs.isFile(taskOutput)) { > =09=E2=80=A6 do stuff =E2=80=A6 =20 > } else if(fs.getFileStatus(taskOutput).isDir()) { > =09=E2=80=A6 do stuff =E2=80=A6=20 > } > {code} > And what we saw happening is that for the problem jobs neither path was b= eing exercised. I've attached the task log of our instrumented code. In t= his version we added an else statement and printed out the line "THIS SEEMS= LIKE WE SHOULD NEVER GET HERE =E2=80=A6". > The root cause of this seems to be an eventual consistency issue with S3.= You can see in the log that the first time moveTaskOutputs is called it f= inds that the taskOutput is a directory. It goes into the isDir() branch a= nd successfully retrieves the list of files in that directory from S3 (in t= his case just one file). This triggers a recursive call to moveTaskOutputs= for the file found in the directory. But in this pass through moveTaskOut= put the temporary output file can't be found resulting in both branches of = the above if statement being skipped and the temporary file never being mov= ed to the final output location. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira