Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCEB9DEDE for ; Wed, 12 Sep 2012 22:52:29 +0000 (UTC) Received: (qmail 15559 invoked by uid 500); 12 Sep 2012 22:52:25 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 15478 invoked by uid 500); 12 Sep 2012 22:52:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15471 invoked by uid 99); 12 Sep 2012 22:52:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Sep 2012 22:52:25 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hsn@filez.com designates 64.6.108.239 as permitted sender) Received: from [64.6.108.239] (HELO ponto.amerinoc.com) (64.6.108.239) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Sep 2012 22:52:16 +0000 Received: from fbsd8.localdomain (205.83.broadband7.iol.cz [88.102.83.205]) (authenticated bits=128) by ponto.amerinoc.com (8.14.5/8.14.5) with ESMTP id q8CMpp19022821 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 13 Sep 2012 00:51:54 +0200 (CEST) (envelope-from hsn@filez.com) Received: from [127.0.0.1] ([10.0.0.1]) by fbsd8.localdomain (8.14.5/8.14.5) with ESMTP id q8CMphT4014606 for ; Thu, 13 Sep 2012 00:51:44 +0200 (CEST) (envelope-from hsn@filez.com) Message-ID: <505111F9.6090608@filez.com> Date: Thu, 13 Sep 2012 00:51:37 +0200 From: Radim Kolar User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: multipleoutputs does not like speculative execution in map-only job Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: avast! (VPS 120912-0, 12.09.2012), Outbound message X-Antivirus-Status: Clean with speculative execution enabled Hadoop can run task attempt on more then 1 node. If mapper is using multipleoutputs then second attempt (or sometimes even all) fails to create output file because it is being created by another attempt: attempt_1347286420691_0011_m_000000_0 attempt_1347286420691_0011_m_000000_1 .. fails with Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /cznewgen/segments/20120907190053/parse_db/-m-00000 in my code i am using mos.write with 4 arguments. this problem is discussed in javadoc for FileOutputFormat function getWorkOutputPath, its possible to change MultipleOutputs to take advantage of this function? or its better to change FileOoutputFormat.getUniqueFile() to append last digit in attempt id to filename to create unique names such as /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?