Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 9488 invoked from network); 11 Jun 2010 02:26:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Jun 2010 02:26:46 -0000 Received: (qmail 61147 invoked by uid 500); 11 Jun 2010 02:26:46 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 61121 invoked by uid 500); 11 Jun 2010 02:26:46 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 61113 invoked by uid 99); 11 Jun 2010 02:26:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 02:26:45 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 02:26:39 +0000 Received: by vws2 with SMTP id 2so972079vws.35 for ; Thu, 10 Jun 2010 19:26:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.78.4 with SMTP id i4mr210830qak.95.1276223177991; Thu, 10 Jun 2010 19:26:17 -0700 (PDT) Received: by 10.229.32.195 with HTTP; Thu, 10 Jun 2010 19:26:17 -0700 (PDT) In-Reply-To: References:

Date: Fri, 11 Jun 2010 10:26:17 +0800 Message-ID: Subject: Re: set mapred.map.tasks=1 not work From: wd To: hive-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00c09f9b0c0f2da9640488b7dda8 X-Virus-Checked: Checked by ClamAV on apache.org --00c09f9b0c0f2da9640488b7dda8 Content-Type: text/plain; charset=ISO-8859-1 I've tried jvm reuse, useless too.. Total time is about 130s, data only 10M and all small files, 2 nodes. hive/hadoop will run 350+ maps ... 2010/6/10 Edward Capriolo > Also consider setting up jvm reuse this will deal with some mapper > startup penalty. > > How long is you query taking how much data is there? How many nodes? > > On Thursday, June 10, 2010, wd wrote: > > set > hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > > > > and > > > > set hive.merge.size.per.task=1000000; > > set hive.merge.mapfiles=true; > > > > seames all useless here, time token for execute 'select a, count(1) from > t1 group by a' is almost the same. > > > > Have I missed some other settings ? > > > > 2010/6/10 wd > > > > Thanks everyone, I'll try CombineHiveInputFormat. :) > > > > 2010/6/10 Namit Jain > > > > > > CombineHiveInputFormat > > > > > --00c09f9b0c0f2da9640488b7dda8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I've tried jvm reuse, useless too..

Total time is about 130s, da= ta only 10M and all small files, 2 nodes.

hive/hadoop will run 350+ = maps ...

2010/6/10 Edward Capriolo <edlinuxguru@gmail.= com>

Also consider set= ting up jvm reuse this will deal with some mapper
startup penalty.

How long is you query taking how much data is there? How many nodes?

On Thursday, June 10, 2010, wd <wd@wdicc= .com> wrote:
> set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.CombineHiveInputF= ormat;
>
> and
>
> set hive.merge.size.per.task=3D1000000;
> set hive.merge.mapfiles=3Dtrue;
>
> seames all useless here, time token for execute 'select a, count(1= ) from t1 group by a' is almost the same.
>
> Have I missed some other settings ?
>
> 2010/6/10 wd <wd@wdicc.com><= br> >
> Thanks everyone, I'll try CombineHiveInputFormat. :)
>
> 2010/6/10 Namit Jain <njain@f= acebook.com>
>
>
> CombineHiveInputFormat
>
>

--00c09f9b0c0f2da9640488b7dda8--