Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97792D43C for ; Thu, 1 Nov 2012 22:45:53 +0000 (UTC) Received: (qmail 52822 invoked by uid 500); 1 Nov 2012 22:45:48 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 52713 invoked by uid 500); 1 Nov 2012 22:45:48 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 52706 invoked by uid 99); 1 Nov 2012 22:45:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2012 22:45:48 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dhruv21@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2012 22:45:42 +0000 Received: by mail-qc0-f176.google.com with SMTP id n41so2365437qco.35 for ; Thu, 01 Nov 2012 15:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=i2yNaTVMh3RXuMkl1OE/MIqoyO/hN8pzwhsH2oWlk1c=; b=GlFgPhwctpxfFLXGw7n3r9WDGZDzrYjEPV5eN3MY0Zx3+w1YdJ+wXUvzfP9RquKWO+ v7feJ/ZpTwbO/5KPbi4magyhXzYm20U6MgFtY1mpco7y61AnXH1Cn9wyXz9KUilrhORW CPPXykMErk6yVqak04NIVNRmYqfZeRGE7so6PP701dsUbhHT9JnArgHoKDvEwJQSddEg NsqWJgD36CeV1gEZ3rEouLsf+beV/Va3rF7/fffQ63ZGVfI9j66Ns6+z3TUpAK+XUekX 3IYSkKLZq4cyo27/FBp6Cm4lUimxEeCgc8iL30Ad4jaW4RM4RJqiq3+VDaQf8HTO6pGy Ij7Q== MIME-Version: 1.0 Received: by 10.229.172.84 with SMTP id k20mr7241936qcz.42.1351809921480; Thu, 01 Nov 2012 15:45:21 -0700 (PDT) Received: by 10.49.133.33 with HTTP; Thu, 1 Nov 2012 15:45:21 -0700 (PDT) Date: Thu, 1 Nov 2012 15:45:21 -0700 Message-ID: Subject: OutputFormat and Reduce Task From: Dhruv To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e68ee9852c7a0504cd76c672 X-Virus-Checked: Checked by ClamAV on apache.org --0016e68ee9852c7a0504cd76c672 Content-Type: text/plain; charset=ISO-8859-1 I'm trying to optimize the performance of my OutputFormat's implementation. I'm doing things similar to HBase's TableOutputFormat--sending the reducer's output to a distributed k-v store. So, the context.write() call basically winds up doing a Put() on the store. Although I haven't profiled, a sequence of thread dumps on the reduce tasks reveal that the threads are RUNNABLE and hanging out in the put() and its subsequent method calls. So, I proceeded to decouple these two by implementing the producer (context.write()) consumer (RecordWriter.write()) pattern using ExecutorService. My understanding is that Context.write() calls RecordWriter.write() and that these two are synchronous calls. The first will block until the second method completes.Each reduce phase blocks until the context.write() finishes, so the next reduce on the next key also blocks, making things run slow in my case. Is this correct? Does this mean that OutputFormat is instantiated once by the TaskTracker for the Job's reduce logic and all keys operated on by the reducers get the same instance of the OutputFormat. Or, is it that for each key operated by the reducer, a new OutputFormat is instantiated? Thanks, Dhruv --0016e68ee9852c7a0504cd76c672 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm trying to optimize the performance of my OutputFormat's impleme= ntation.=A0I'm doing things similar to HBase's TableOutputFormat--s= ending the reducer's output to a distributed k-v store. So, the context= .write() call basically winds up doing a Put() on the store.=A0

Although I haven't profiled, a sequence of thread dumps = on the reduce tasks reveal that the threads are RUNNABLE and hanging out in= the put() and its subsequent method calls. So,=A0I proceeded to decouple t= hese two by implementing the producer (context.write()) consumer (RecordWri= ter.write()) pattern using ExecutorService.

My understanding is that Context.write() calls RecordWr= iter.write() and that these two are synchronous calls. The first will block= until the second method completes.Each reduce phase blocks until the conte= xt.write() finishes, so the next reduce on the next key also blocks, making= things run slow in my case. Is this correct? Does this mean that OutputFor= mat is instantiated once by the TaskTracker for the Job's reduce logic = and all keys operated on by the reducers get the same instance of the Outpu= tFormat. Or, is it that for each key operated by the reducer, a new OutputF= ormat is instantiated?

Thanks,
Dhruv
--0016e68ee9852c7a0504cd76c672--