Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 559BE105D1 for ; Tue, 21 Jan 2014 20:42:53 +0000 (UTC) Received: (qmail 95470 invoked by uid 500); 21 Jan 2014 20:42:45 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 95339 invoked by uid 500); 21 Jan 2014 20:42:45 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 95331 invoked by uid 99); 21 Jan 2014 20:42:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jan 2014 20:42:44 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kwiley@keithwiley.com designates 69.93.106.23 as permitted sender) Received: from [69.93.106.23] (HELO gateway08.websitewelcome.com) (69.93.106.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jan 2014 20:42:39 +0000 Received: by gateway08.websitewelcome.com (Postfix, from userid 5007) id 406F564BE9A2C; Tue, 21 Jan 2014 14:42:18 -0600 (CST) Received: from gator3023.hostgator.com (gator3023.hostgator.com [50.87.144.34]) by gateway08.websitewelcome.com (Postfix) with ESMTP id 2B40864BE9A04 for ; Tue, 21 Jan 2014 14:42:18 -0600 (CST) Received: from [24.19.6.8] (port=46202 helo=[192.168.10.2]) by gator3023.hostgator.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from ) id 1W5i9J-0001oS-O6 for user@hadoop.apache.org; Tue, 21 Jan 2014 14:42:17 -0600 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1085) Subject: Re: Is perfect control over mapper num AND split distribution possible? From: Keith Wiley In-Reply-To: Date: Tue, 21 Jan 2014 12:42:16 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1085) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator3023.hostgator.com X-AntiAbuse: Original Domain - hadoop.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - keithwiley.com X-BWhitelist: no X-Source-IP: 24.19.6.8 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: ([192.168.10.2]) [24.19.6.8]:46202 X-Source-Auth: kwiley+keithwiley.com X-Email-Count: 1 X-Source-Cap: a2J3aWxleTtrYndpbGV5O2dhdG9yMzAyMy5ob3N0Z2F0b3IuY29t X-Virus-Checked: Checked by ClamAV on apache.org I'll look it up. Thanks. On Jan 21, 2014, at 11:43 , java8964 wrote: > You cannot use hadoop "NLineInputFormat"? >=20 > If you generate 100 lines of text file, by default, one line will = trigger one mapper task. >=20 > As long as you have 100 task slot available, you will get 100 mapper = running concurrently. >=20 > You want perfect control over mapper num? NLineInputFormat is designed = for your purpose. >=20 > Yong = __________________________________________________________________________= ______ Keith Wiley kwiley@keithwiley.com keithwiley.com = music.keithwiley.com "What I primarily learned in grad school is how much I *don't* know. Consequently, I left grad school with a higher ignorance to knowledge = ratio than when I entered." -- Keith Wiley = __________________________________________________________________________= ______