Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A7A5C200BE4 for ; Wed, 21 Dec 2016 13:58:57 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A6424160B26; Wed, 21 Dec 2016 12:58:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CAA2E160B0C for ; Wed, 21 Dec 2016 13:58:56 +0100 (CET) Received: (qmail 45609 invoked by uid 500); 21 Dec 2016 12:58:54 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 45592 invoked by uid 99); 21 Dec 2016 12:58:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2016 12:58:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A3AD1CD5F2 for ; Wed, 21 Dec 2016 12:58:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.866 X-Spam-Level: ** X-Spam-Status: No, score=2.866 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_IMAGE_ONLY_28=0.726, HTML_MESSAGE=2, HTML_OBFUSCATE_05_10=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Qm0sJnRj_5vC for ; Wed, 21 Dec 2016 12:58:52 +0000 (UTC) Received: from mail-oi0-f54.google.com (mail-oi0-f54.google.com [209.85.218.54]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 9283D5F24B for ; Wed, 21 Dec 2016 12:58:52 +0000 (UTC) Received: by mail-oi0-f54.google.com with SMTP id b126so209297870oia.2 for ; Wed, 21 Dec 2016 04:58:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=CrGrgBTNXAjtse+1ZQ1G9jyPOTCW3IeKMxdqR0ORc30=; b=rfmqOOsXARcmLa8/dV2ToA/EBccvtONZW9B7QXjEMvMvC2JqfcGPToDE0oNB2aPvyi usrVGDbTu9QV03vvD2/d22w0cp5xEuWlf79ee0Trup5Lg+098y9rhc62AuMBv8drWSZ7 jFMLytueADXSL23pwqWnOVFEXqXlRSPOkJaPEnZj/FFl6gHZrNu+s1hA40yBNSdTOPX3 SJrTijhmVY+pN3FqPaw824GlYa3Vcga6fgL6Dxa2wz2FB7QD+9o+6hSuh70/xwCbNuH1 xCtrvREa0T4oC+0g668jY6byYBERhruOpRzuZ6C3r9U2Xrt/ffVzaBviZTCwAsNeYuk9 sWLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=CrGrgBTNXAjtse+1ZQ1G9jyPOTCW3IeKMxdqR0ORc30=; b=NXnjI4n/nvw5S+5wKddwxSaQUn5zmebW1C5JbTNCFEO+f9PWqLG/693Jae5TDd0UNY c493bGl0CAsoqZ2T6Wtl+csF8/rMV463JWizclGSYvfVmfaUD6mQrEd4XIGtB3ld6Mzy KxfbWaN8YoF4v97RQuM+cDcQENvNGGZvEh63Aw9m/CMZARzL3MnnsuVlFWopHFNzn302 XFZyuHru+nlfbpY8bqks3+WV4xcxysDdYPxCH6h7zJgIxfy+VPzFWFAezgvhHDPPyyM/ a74TinfAm6vN7X/NZcyy+kAU5fq99MGWv4TgrGXGYQnoxkVZ4gWj7lvFNIbG1RCvKtuZ QpEw== X-Gm-Message-State: AIkVDXJo9q426IkxGO2Z9jKbcnULd7LKiP0PQfJrdSiQNOVSS/KFiexssfb7d8NbzbLZpW8cTG/gJuHaOUGX8A== X-Received: by 10.157.29.8 with SMTP id m8mr2340522otm.18.1482325124591; Wed, 21 Dec 2016 04:58:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.120.133 with HTTP; Wed, 21 Dec 2016 04:58:44 -0800 (PST) From: Ashish Paliwal Date: Wed, 21 Dec 2016 18:28:44 +0530 Message-ID: Subject: Hadoop MultiOutputs API Issue To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11376f927e363605442ab65a archived-at: Wed, 21 Dec 2016 12:58:57 -0000 --001a11376f927e363605442ab65a Content-Type: text/plain; charset=UTF-8 Hi, Hadoop Map Reduce version: 2.2.0 We are using MultiOutputs to write mullitple output files from Mapper(No reducer). As per requirement, multioutput should write in directory other than job's default output directory. So We used below MultiOutput method to write in different directory. public void write(String namedOutput, K key, V value,String baseOutputPath) Now, if any Map task run for longer time, then (cause speculative execution enabled), hadoop start parallel task to complete task early. Now, both task trying to write in same directory in same file. Second task failed with "File already exists issue" and so Job. After analyzing it founds that, like default context writer, *MultiOutputs API does not create any temporary directory*. It directly starts writing into output directory. and the reason is FileOutputCommitter used by default context writer (and so Application Master) is different than MultiOutputs.writer. So in case of MultiOutput, none of the method of FileOutputCommitter is get called. So is it known issue or default behavior? And what is the solution for this problem? Regards, Ashish. --001a11376f927e363605442ab65a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Hadoop Map Reduce versio= n: 2.2.0

We are using MultiOutputs to write mullit= ple output files from Mapper(No reducer). As per requirement, multioutput s= hould write in directory other than job's default output directory. So = We used below MultiOutput=C2=A0method to write in different directory.=C2= =A0

=C2=A0public= =C2=A0<K,=C2=A0V>=C2=A0void=C2=A0write(String=C2= =A0namedOutput,=C2=A0K=C2=A0key,=C2=A0V=C2=A0value,String=C2=A0baseOutputPath)

Now, if any Map task run for longer time, then = (cause=C2=A0speculative execution enabled), hadoop start parallel task to c= omplete task early. Now, both task trying to write in same directory in sam= e file. Second task failed with "File already exists issue" and s= o Job.

After analyzing it founds that, like defau= lt context writer,=C2=A0MultiOutputs API does not create any tempor= ary directory. It directly starts writing into output directory.= =C2=A0and the reason is FileOutputCommitter used by default context writer = (and so Application Master) is different than=C2=A0MultiOutputs.writer. So = in case of MultiOutput, none of the method of FileOutputCommitter is get ca= lled.
=C2=A0
So is it known issue or default behavior? = And what is the solution=C2=A0for this problem?

Regards,
Ashish.
--001a11376f927e363605442ab65a--