Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 46160 invoked from network); 14 Nov 2008 01:01:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Nov 2008 01:01:22 -0000 Received: (qmail 97802 invoked by uid 500); 14 Nov 2008 01:01:24 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 97767 invoked by uid 500); 14 Nov 2008 01:01:24 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 97755 invoked by uid 99); 14 Nov 2008 01:01:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2008 17:01:24 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [220.227.179.21] (HELO Kecgate03.infosys.com) (220.227.179.21) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Nov 2008 01:00:04 +0000 X-TM-IMSS-Message-ID: <312b69af0001439a@Kecgate03.infosys.com> Received: from blrkechub03.ad.infosys.com ([10.66.236.43]) by Kecgate03.infosys.com ([220.227.179.21]) with ESMTP (TREND IMSS SMTP Service 7.0; TLS: TLSv1/SSLv3,128bits,RC4-MD5) id 312b69af0001439a ; Fri, 14 Nov 2008 06:25:40 +0530 Received: from BLRKECMBX03.ad.infosys.com ([10.66.236.26]) by blrkechub03.ad.infosys.com ([10.66.236.43]) with mapi; Fri, 14 Nov 2008 06:25:42 +0530 From: souravm To: "core-user@hadoop.apache.org" Date: Fri, 14 Nov 2008 06:25:30 +0530 Subject: Any suggestion on performance improvement ? Thread-Topic: Any suggestion on performance improvement ? Thread-Index: AclF8JyTM9d9rY57RHm/qjikCxuEzAAAjHiQ Message-ID: <5FDD79D8731F4A4B87487A5A85327B820424ADF8C1@BLRKECMBX03.ad.infosys.com> References: <11dfba390811131219q21e142c0jc54623bfd1f95551@mail.gmail.com> <623d9cf40811131632r77b0c920v656ea09ac6563b92@mail.gmail.com> In-Reply-To: <623d9cf40811131632r77b0c920v656ea09ac6563b92@mail.gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I'm=20testing=20with=20a=204=20node=20setup=20of=20Hadoop=20hdfs.=20 Each=20node=20has=20configuration=20of=202GB=20memory=20and=20dual=20core= =20and=20around=2030-60=20GB=20disk=20space. I've=20kept=20files=20of=20different=20sizes=20in=20the=20hdfs=20ranging= =20from=2010MB=20to=205=20GB. I'm=20querying=20those=20files=20using=20PIG.=20What=20I'm=20seeing=20tha= t=20even=20a=20simple=20select=20query=20(LOAD=20and=20FILTER)=20is=20tak= ing=20at=20least=2030-40=20sec=20of=20time.=20The=20MAP=20process=20in=20= one=20node=20takes=20at=20least=2025=20sec. I've=20kept=20the=20jvm=20max=20heap=20size=20to=201024m. Any=20suggestion=20on=20how=20to=20improve=20the=20performance=20with=20d= ifferent=20configuration=20at=20Hadoop=20level=20(by=20changing=20hdfs=20= and=20MapReduce=20parameters)=20? Regards, Sourav ****************=20CAUTION=20-=20Disclaimer=20***************** This=20e-mail=20contains=20PRIVILEGED=20AND=20CONFIDENTIAL=20INFORMATION= =20intended=20solely=20 for=20the=20use=20of=20the=20addressee(s).=20If=20you=20are=20not=20the= =20intended=20recipient,=20please=20 notify=20the=20sender=20by=20e-mail=20and=20delete=20the=20original=20mes= sage.=20Further,=20you=20are=20not=20 to=20copy,=20disclose,=20or=20distribute=20this=20e-mail=20or=20its=20con= tents=20to=20any=20other=20person=20and=20 any=20such=20actions=20are=20unlawful.=20This=20e-mail=20may=20contain=20= viruses.=20Infosys=20has=20taken=20 every=20reasonable=20precaution=20to=20minimize=20this=20risk,=20but=20is= =20not=20liable=20for=20any=20damage=20 you=20may=20sustain=20as=20a=20result=20of=20any=20virus=20in=20this=20e-= mail.=20You=20should=20carry=20out=20your=20 own=20virus=20checks=20before=20opening=20the=20e-mail=20or=20attachment.= =20Infosys=20reserves=20the=20 right=20to=20monitor=20and=20review=20the=20content=20of=20all=20messages= =20sent=20to=20or=20from=20this=20e-mail=20 address.=20Messages=20sent=20to=20or=20from=20this=20e-mail=20address=20m= ay=20be=20stored=20on=20the=20 Infosys=20e-mail=20system. ***INFOSYS********=20End=20of=20Disclaimer=20********INFOSYS***