Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 72807 invoked from network); 31 Aug 2007 18:42:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Aug 2007 18:42:49 -0000 Received: (qmail 61819 invoked by uid 500); 31 Aug 2007 18:42:43 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 61797 invoked by uid 500); 31 Aug 2007 18:42:43 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 61788 invoked by uid 99); 31 Aug 2007 18:42:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Aug 2007 11:42:43 -0700 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Aug 2007 18:42:39 +0000 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C7EBFE.A898EC93" Subject: RE: Compression using Hadoop... Date: Fri, 31 Aug 2007 11:40:22 -0700 Message-ID: <435DF58A933BA74397B42CDEB8145A86091DB315@ex9.hostedexchange.local> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Compression using Hadoop... Thread-Index: Acfr+9H0Ph7jl3FMR2qajODeSSd/UQAApGqI References: <118046.65041.qm@web45404.mail.sp1.yahoo.com> From: "Ted Dunning" To: , X-Virus-Checked: Checked by ClamAV on apache.org ------_=_NextPart_001_01C7EBFE.A898EC93 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable My 10x was very rough. I based it on: a) you want a few files per map task b) you want a map task per core I tend to use quad core machines and so I used 2 x 8 =3D 10 (roughly). On EC2, you don't have multi-core machines (I think) so you might be = fine with 2-4 files per CPU. -----Original Message----- From: C G [mailto:parallelguy@yahoo.com] Sent: Fri 8/31/2007 11:21 AM To: hadoop-user@lucene.apache.org Subject: RE: Compression using Hadoop... =20 > Ted, from what you are saying I should be using at least 80 files = given the cluster size, and I should modify the loader to be aware=20 > of the number of nodes and split accordingly. Do you concur? ------_=_NextPart_001_01C7EBFE.A898EC93--