Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 27109 invoked from network); 11 Jan 2010 20:45:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Jan 2010 20:45:55 -0000 Received: (qmail 81548 invoked by uid 500); 11 Jan 2010 20:45:53 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 81463 invoked by uid 500); 11 Jan 2010 20:45:53 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 81453 invoked by uid 99); 11 Jan 2010 20:45:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 20:45:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kuosenhao@gmail.com designates 209.85.160.50 as permitted sender) Received: from [209.85.160.50] (HELO mail-pw0-f50.google.com) (209.85.160.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 20:45:43 +0000 Received: by pwi20 with SMTP id 20so1766267pwi.29 for ; Mon, 11 Jan 2010 12:45:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=T16tYebFkppsR9Xf74ayoWiOB4GhS+gtI2TgF1gU5dg=; b=AA8XPFDsU4betyCe/UhMdxs1Gm/YQO8LTxn+BO38C9IoUAUyGE22te0+5ttCXO2xU3 rsayMBskwu19Jtu4mxFvrNuqJh/9VfnXOgxwWNgjugt1yHiSv0jY1UKPIlV/YP+4bGg6 IkxWe3LQvMgDsthTkoJ9ptOWDcfclVTiAiQww= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ow1X9m8773pvgZssfPII07qwfK00UKDmj9PJD8mdegEARZgBHLn3YiQJBlHJBEdBl3 4/A7wyqr7iFdcT2DgPcHOuT/YumGZFuXqMBKF3ClNZ4ZpaqRLp8cGIBG0qwWK5tES8bo URzbEURf5ZkMGeyL6tzMzf7XRaFn64TBoyy+A= MIME-Version: 1.0 Received: by 10.142.74.7 with SMTP id w7mr127952wfa.149.1263242723515; Mon, 11 Jan 2010 12:45:23 -0800 (PST) In-Reply-To: <17e273101001081313u29c24bb1wd363e96bc870cb61@mail.gmail.com> References: <17e273101001071417s674d5258hadb3890b5d71800a@mail.gmail.com> <17e273101001081108m42604b48k47e6ca1fdb958888@mail.gmail.com> <17e273101001081313u29c24bb1wd363e96bc870cb61@mail.gmail.com> Date: Mon, 11 Jan 2010 12:45:23 -0800 Message-ID: Subject: Re: isSplitable() deprecated From: Steve Kuo To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636e1fb26ccdc9b047ce99d3e --001636e1fb26ccdc9b047ce99d3e Content-Type: text/plain; charset=ISO-8859-1 Ted, You may want to consider LZO compression, which allows splitting for a comporessed file for Map jobs. On the other hand, gzip is not splittable. Check out these links. http://www.cloudera.com/blog/2009/11/17/hadoop-at-twitter-part-1-splittable-lzo-compression/ http://wiki.apache.org/hadoop/UsingLzoCompression On Fri, Jan 8, 2010 at 1:13 PM, Ted Yu wrote: > The input file is in .gz format > FYI > > On Fri, Jan 8, 2010 at 11:08 AM, Ted Yu wrote: > > > My current project processes input file of size 333302161 bytes. > > What I plan to do is to split the file into equal size pieces (and on > blank > > line boundary) to improve performance. > > > > I found 12 classes in 0.20.1 source code which implement InputSplit. > > > > If someone has written code similar to what I plan to do, please share > some > > hint. > > > > Thanks > > > > > > On Fri, Jan 8, 2010 at 2:27 AM, Amogh Vasekar > wrote: > > > >> Hi, > >> The deprecation is due to the new evolving mapreduce ( o.a.h.mapreduce ) > >> APIs. Old APIs are supported for available distributions. The equivalent > of > >> TextInputFormat is available in new API : > >> > >> > >> > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html > >> > >> Thanks, > >> Amogh > >> > >> > >> On 1/8/10 3:47 AM, "Ted Yu" wrote: > >> > >> According to: > >> > >> > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/TextInputFormat.html#isSplitable%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path%29 > >> > >> isSplitable() is deprecated. > >> > >> Which method should I use to replace it ? > >> > >> Thanks > >> > >> > > > --001636e1fb26ccdc9b047ce99d3e--