Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 48254 invoked from network); 26 Dec 2010 16:21:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Dec 2010 16:21:34 -0000 Received: (qmail 40654 invoked by uid 500); 26 Dec 2010 16:21:30 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 40457 invoked by uid 500); 26 Dec 2010 16:21:28 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 40449 invoked by uid 99); 26 Dec 2010 16:21:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Dec 2010 16:21:27 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of qwertymaniac@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Dec 2010 16:21:23 +0000 Received: by fxm2 with SMTP id 2so7920064fxm.35 for ; Sun, 26 Dec 2010 08:21:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=z85MtaHran00lSSYbgv88OErMi4XoFWbRHjxszL88tE=; b=RqQQ70lWr0fzz9+96AqaTe3Bg8B1so0h4B601waOBBTz4/c3cxAL5aOIpz9JWg9grA v8NoaBj5kE6xGC5j3r61NQl5yomME+UVBcxhNw4I1WwJl1XZ/nn+x7+3LGwRvWdAcg0V qkKzsCgjcWz3vhEXBQjerHxng84miPs5bQz6o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=oFEAuSNlxcf0zVzJB62/DvRbAIhQXcBQD3XNn6kHJ3Z5PG0SuVi1k4tSqnyZ0DkgLb j6qiVB8ue5PydLvQSdC+MRZ4oHjsSNfqQHNEFWorfq5AiNfty1hhKkcys3zNy08bCZjF 6gws4IYAhLky2i/AJDezsniHQA8vR8Ix/Tjqw= Received: by 10.223.72.9 with SMTP id k9mr3296026faj.93.1293380461336; Sun, 26 Dec 2010 08:21:01 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.120.14 with HTTP; Sun, 26 Dec 2010 08:20:41 -0800 (PST) In-Reply-To: <9E0073162FD25B489A01AD86D92D983E0449044D@XMBIL132.northgrum.com> References: <47331.61478.qm@web39708.mail.mud.yahoo.com> <08C814576B3D664DBE68BE682A57AE990D39327A@USA0300MS03.na.xerox.net> <9E0073162FD25B489A01AD86D92D983E04D2EBDB@XMBIL132.northgrum.com> <9E0073162FD25B489A01AD86D92D983E0449044D@XMBIL132.northgrum.com> From: Harsh J Date: Sun, 26 Dec 2010 21:50:41 +0530 Message-ID: Subject: Re: Custom input split To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS) wrote: > I assume there's a way to make a specific # of splits and add each docume= nt to the separate splits...but I'll be darned if I can find the docs or an= example to show this. Would CombineFileInputFormat and CombineFileSplit be what you're looking fo= r? Doc links: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hado= op/mapred/lib/CombineFileInputFormat.html & http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred= /lib/CombineFileSplit.html > As I said I'm using hadoop-0.20.2 which I know makes a difference as so m= any things get deprecated on each release. =A0Old references don't seem to = work. The API marked deprecated in 0.20.{0,1,2} has been un-deprecated in the 0.21.0 release and is also considered as the "stable" API. You can continue using it, as it is still supported. (Maybe 0.20.3 will have them un-deprecated too, I'm not sure what's the status on that, although doing so would surely help avoid beginner confusion.) --=20 Harsh J www.harshj.com