Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 29FA8745B for ; Fri, 14 Oct 2011 15:49:55 +0000 (UTC) Received: (qmail 33916 invoked by uid 500); 14 Oct 2011 15:49:54 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 33866 invoked by uid 500); 14 Oct 2011 15:49:54 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 33857 invoked by uid 99); 14 Oct 2011 15:49:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2011 15:49:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of justin.woody@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2011 15:49:46 +0000 Received: by vws7 with SMTP id 7so1631307vws.35 for ; Fri, 14 Oct 2011 08:49:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=WvbDFevBoO+PWS2DVKm3KS/qX7/dbs81U0e56Wp/DzY=; b=vSoi60yVvUcAoA5PoirZnyjKTz8EDrqvu2SN8ZqhkBKV5V9tuCN4R2t+RJOPFCDzi+ qSN8qdDRb2ITOQA97L04tF45sfgrbwV2DtqVH7+MVr1DgwDMJD1O7WkdUQOFlf7KJzIb 2EOigA7dx36yjvYSQrvXs2VGuIbZHAVXcPZTo= MIME-Version: 1.0 Received: by 10.52.96.162 with SMTP id dt2mr9420117vdb.37.1318607365852; Fri, 14 Oct 2011 08:49:25 -0700 (PDT) Received: by 10.52.108.226 with HTTP; Fri, 14 Oct 2011 08:49:25 -0700 (PDT) In-Reply-To: References: Date: Fri, 14 Oct 2011 11:49:25 -0400 Message-ID: Subject: Re: Problems Mapping multigigabyte file From: Justin Woody To: mapreduce-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Steve, Is the input file splittable? Justin On Fri, Oct 14, 2011 at 11:23 AM, Steve Lewis wrote= : > I have an MR task which runs well with a single input file or an input > directory with dozens of 50MB input files. > When the data is in a single input file of 1 GB of more the mapper never > gets to 0%. There are not errors but when I look at the cluster, the CPUs > are spending huge amounts of time in a wait state. The job runs when the > input is 800MB and can complete even=A0with=A0a number of 500MB files as = input. > The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB. > Any bright ideas > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > >