Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1EA9101CF for ; Sun, 11 Aug 2013 12:50:12 +0000 (UTC) Received: (qmail 4201 invoked by uid 500); 11 Aug 2013 12:50:06 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 3894 invoked by uid 500); 11 Aug 2013 12:50:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 67317 invoked by uid 99); 11 Aug 2013 07:56:53 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sathwik.bp@gmail.com designates 209.85.216.46 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=QNcNUZdUCbE6QtLaZKqLTVGhnM5j8L+sS2lzbDnd87E=; b=eRntHheAYjgDErS0Bv9d4Y2RvnKBVUAQ8vJPLdQmeBMFK3Ax5BRRrgwpsnWE3z+Y0H btVC0+tE2ZWyCGoOuHc5eUvtIzMUtgs6ORlYDaFwWad+RzC7IZigQWgeAFMz2gat/tY4 ODB0qPc2CjIA64oxO4CPSa+RxkRYwR0vSYoWjslXx5wax60RHbmzuDjORwrXoPus/Sz7 KZZXUBzRmoc/CAbzBE/FsWzCCG4sNC8dvcC7ez994ODdFyrK4BWBoWDYvqAjinUmDEqv huuEaTZ0WmF2zjmMdVU+mxiUFiUGAv+/I/wQ84Bq5yr2YP8r5mgVnZQQ97R8ISXwpWCO WAPw== MIME-Version: 1.0 X-Received: by 10.49.104.47 with SMTP id gb15mr19041719qeb.89.1376207786853; Sun, 11 Aug 2013 00:56:26 -0700 (PDT) In-Reply-To: References: Date: Sun, 11 Aug 2013 13:26:26 +0530 Message-ID: Subject: Re: Why LineRecordWriter.write(..) is synchronized From: Sathwik B P To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b67890245cdd404e3a75884 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b67890245cdd404e3a75884 Content-Type: text/plain; charset=ISO-8859-1 Hi Harsh, Does it make any sense to keep the method in LRW still synchronized. Isn't it creating unnecessary overhead for non multi threaded implementations. regards, sathwik On Fri, Aug 9, 2013 at 7:16 AM, Harsh J wrote: > I suppose I should have been clearer. There's no problem out of box if > people stick to the libraries we offer :) > > Yes the LRW was marked synchronized at some point over 8 years ago [1] > in support for multi-threaded maps, but the framework has changed much > since then. The MultithreadedMapper/etc. API we offer now > automatically shields the devs away from having to think of output > thread safety [2]. > > I can imagine there can only be a problem if a user writes their own > unsafe multi threaded task. I suppose we could document that in the > Mapper/MapRunner and Reducer APIs. > > [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 - > Commit added a synchronized to the write call. > [2] - MultiThreadedMapper/etc. synchronize over the collector - > > http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup > > On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu wrote: > > sequence writer is also synchronized, I dont think this is bad. > > > > if you call HDFS api to write concurrently, then its necessary. > > > > On Aug 8, 2013 7:53 PM, "Jay Vyas" wrote: > >> > >> Then is this a bug? Synchronization in absence of any race condition is > >> normally considered "bad". > >> > >> In any case id like to know why this writer is synchronized whereas the > >> other one are not.. That is, I think, then point at issue: either other > >> writers should be synchronized or else this one shouldn't be - > consistency > >> across the write implementations is probably desirable so that changes > to > >> output formats or record writers don't lead to bugs in multithreaded > >> environments . > >> > >> On Aug 8, 2013, at 6:50 AM, Harsh J wrote: > >> > >> While we don't fork by default, we do provide a MultithreadedMapper > >> implementation that would require such synchronization. But if you are > >> asking is it necessary, then perhaps the answer is no. > >> > >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" wrote: > >>> > >>> its not hadoop forked threads, we may create a line record writer, then > >>> call this writer concurrently. > >>> > >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" wrote: > >>>> > >>>> Hi, > >>>> Thanks for your reply. > >>>> May I know where does hadoop fork multiple threads to use a single > >>>> RecordWriter. > >>>> > >>>> regards, > >>>> sathwik > >>>> > >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu wrote: > >>>>> > >>>>> because we may use multi-threads to write a single file. > >>>>> > >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other > >>>>>> RecordWriter implementations define the write as synchronized. > >>>>>> Any specific reason for this. > >>>>>> > >>>>>> regards, > >>>>>> sathwik > >>>> > >>>> > > > > > > -- > Harsh J > --047d7b67890245cdd404e3a75884 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Harsh,

Does it make any sense to keep the method in LRW still syn= chronized. Isn't it creating unnecessary overhead for non multi threade= d implementations.

regards,
sathwik

On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <harsh@cloudera.com> wrote:
I suppose I should have been clearer. There's no problem out of box if<= br> people stick to the libraries we offer :)

Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. API we offer now
automatically shields the devs away from having to think of output
thread safety [2].

I can imagine there can only be a problem if a user writes their own
unsafe multi threaded task. I suppose we could document that in the
Mapper/MapRunner and Reducer APIs.

[1] - http://svn.apache.org/viewvc?view=3Drevision&a= mp;revision=3D171186 -
Commit added a synchronized to the write call.
[2] - MultiThreadedMapper/etc. synchronize over the collector -
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoo= p-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/sr= c/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?vi= ew=3Dmarkup

On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
> sequence writer is also synchronized, I dont think this is bad.
>
> if you call HDFS api to write concurrently, then its necessary.
>
> On Aug 8, 2013 7:53 PM, "Jay Vyas" <jayunit100@gmail.com> wrote:
>>
>> Then is this a bug? =A0Synchronization in absence of any race cond= ition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized wherea= s the
>> other one are not.. That is, I think, then point at issue: either = other
>> writers should be synchronized or else this one shouldn't be -= consistency
>> across the write implementations is probably desirable so that cha= nges to
>> output formats or record writers don't lead to bugs in multith= readed
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedM= apper
>> implementation that would require such synchronization. But if you= are
>> asking is it necessary, then perhaps the answer is no.
>>
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <azuryyyu@gmail.com> wrote:
>>>
>>> its not hadoop forked threads, we may create a line record wri= ter, then
>>> call this writer concurrently.
>>>
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sathwik.bp@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use = a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:
>>>>>
>>>>> because we may use multi-threads to write a single fil= e.
>>>>>
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sathwik@apache.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did = not find any other
>>>>>> RecordWriter implementations define the write as s= ynchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>
>>>>
>



--
Harsh J

--047d7b67890245cdd404e3a75884--