Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70A4B10DAB for ; Thu, 8 Aug 2013 11:01:34 +0000 (UTC) Received: (qmail 6997 invoked by uid 500); 8 Aug 2013 11:01:28 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 6635 invoked by uid 500); 8 Aug 2013 11:01:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 6627 invoked by uid 99); 8 Aug 2013 11:01:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Aug 2013 11:01:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [74.125.82.173] (HELO mail-we0-f173.google.com) (74.125.82.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Aug 2013 11:01:22 +0000 Received: by mail-we0-f173.google.com with SMTP id x55so2392253wes.4 for ; Thu, 08 Aug 2013 04:00:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=hXcNibxhqS/X2qv56KWKL+0nCcEUxDEgEXHsLelP1CQ=; b=mnevCgmcVXXqsZ3BULsB8xzIoaho0hTRGDIGPvZvWfuSlBU6Gz9MQVSbFyvqcpFKZ4 fHpaaUcC2We8Ivap9oaxLI+TH4KQAKOx1upyPGz6ybwmbZq1mDGlzOlYhzPh+E+LLbjc AcB8FhtavsOyZGU43FmMpOExZRR/+JSOKvLBbXgwgnDzObGaKry/fOis/qcswRo0yXGf 8TT+yVRF0KfAxedtC/oxn6JFNB1kSf0uwagcGKdz8C+djW0+NFLAwkp7mS1spjSqpcDY fpfE4uHsyCqF39W7ElzG931zNIC6gE0rLLNp1Ug5397aYZCVDwx/gfDK3Qz6fd45+x/z n/ew== X-Gm-Message-State: ALoCoQnOCbJ5sOHD4pg+DQtWFxETk4xhG+IT+dI1RE3zYkOmQch2V+/aupfJ8OL8jT3L5cp5geF/ MIME-Version: 1.0 X-Received: by 10.194.175.66 with SMTP id by2mr2969922wjc.59.1375959640790; Thu, 08 Aug 2013 04:00:40 -0700 (PDT) Sender: niels@basj.es Received: by 10.194.40.71 with HTTP; Thu, 8 Aug 2013 04:00:40 -0700 (PDT) X-Originating-IP: [2001:980:91c0:1:b58d:204:1f99:ec8f] In-Reply-To: References: Date: Thu, 8 Aug 2013 13:00:40 +0200 X-Google-Sender-Auth: g8BXBhW_uHpxYk603BlPCQ8Aj2s Message-ID: Subject: Re: Why LineRecordWriter.write(..) is synchronized From: Niels Basjes To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e013d19f89d6db004e36d9101 X-Virus-Checked: Checked by ClamAV on apache.org --089e013d19f89d6db004e36d9101 Content-Type: text/plain; charset=ISO-8859-1 I may be nitpicking here but if "perhaps the answer is no" then I conclude: Perhaps the other implementations of RecordWriter are a race condition/file corruption ready to occur. On Thu, Aug 8, 2013 at 12:50 PM, Harsh J wrote: > While we don't fork by default, we do provide a MultithreadedMapper > implementation that would require such synchronization. But if you are > asking is it necessary, then perhaps the answer is no. > On Aug 8, 2013 3:43 PM, "Azuryy Yu" wrote: > >> its not hadoop forked threads, we may create a line record writer, then >> call this writer concurrently. >> On Aug 8, 2013 4:00 PM, "Sathwik B P" wrote: >> >>> Hi, >>> Thanks for your reply. >>> May I know where does hadoop fork multiple threads to use a single >>> RecordWriter. >>> >>> regards, >>> sathwik >>> >>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu wrote: >>> >>>> because we may use multi-threads to write a single file. >>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote: >>>> >>>>> Hi, >>>>> >>>>> LineRecordWriter.write(..) is synchronized. I did not find any other >>>>> RecordWriter implementations define the write as synchronized. >>>>> Any specific reason for this. >>>>> >>>>> regards, >>>>> sathwik >>>>> >>>> >>> -- Best regards / Met vriendelijke groeten, Niels Basjes --089e013d19f89d6db004e36d9101 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I may be nitpicking here but if "perhaps the answer i= s no" then I conclude: Perhaps the other implementations of RecordWrit= er are a race condition/file corruption ready to occur.


On Thu, Aug 8, 2013 at 12:50 PM, Harsh J= <harsh@cloudera.com> wrote:

While we don't fork by default, we do provide a Multithr= eadedMapper implementation that would require such synchronization. But if = you are asking is it necessary, then perhaps the answer is no.

On Aug 8, 2013 3:43 PM, "Azuryy Yu" &l= t;azuryyyu@gmail.co= m> wrote:

its not hadoop forked threads, we may create a line record w= riter, then call this writer concurrently.

On Aug 8, 2013 4:00 PM, "Sathwik B P" = <sathwik.bp@gm= ail.com> wrote:
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple= threads to use a single RecordWriter.

regards,
sathwik

On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <azury= yyu@gmail.com> wrote:

because we may use multi-thre= ads to write a single file.

On Aug 8, 2013 2:54 PM, "Sathwik B P" = <sathwik@apache.= org> wrote:
Hi,

LineRecordWriter.write(..) is synchronized. I did not find any o= ther RecordWriter implementations define the write as synchronized.
Any = specific reason for this.

regards,
sathwik




--
= Best regards / Met vriendelijke groeten,

Niels Basjes --089e013d19f89d6db004e36d9101--