Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 804297D2F for ; Wed, 9 Nov 2011 03:06:16 +0000 (UTC) Received: (qmail 98418 invoked by uid 500); 9 Nov 2011 03:06:15 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 98376 invoked by uid 500); 9 Nov 2011 03:06:15 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 98368 invoked by uid 99); 9 Nov 2011 03:06:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2011 03:06:14 +0000 X-ASF-Spam-Status: No, hits=2.1 required=5.0 tests=FREEMAIL_FROM,HK_RANDOM_ENVFROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matthewtckr@gmail.com designates 209.85.161.176 as permitted sender) Received: from [209.85.161.176] (HELO mail-gx0-f176.google.com) (209.85.161.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2011 03:06:07 +0000 Received: by ggnc4 with SMTP id c4so340565ggn.35 for ; Tue, 08 Nov 2011 19:05:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=zQxw0bKF2yB1njltixItiVaAd63wKzLQwJp7j7/PqdE=; b=GMuxE9YvohJ8syx4rbxtJNg6eawgoCt5CxFnyLAizhilxlYHdH8p/3kVx25Vm2mno7 hZB48U5rN1G4ihuvVMuOa/HfC+xlFcijQENALsWODNAjYk/hgGAo7hdBR6FwoIYUPl0V bc5qL3E3Kg7PP3n4tjt+zNPrxyroznfRPLAmg= Received: by 10.101.3.15 with SMTP id f15mr254310ani.160.1320807946605; Tue, 08 Nov 2011 19:05:46 -0800 (PST) Received: from [10.37.188.229] (mobile-166-137-136-088.mycingular.net. [166.137.136.88]) by mx.google.com with ESMTPS id l18sm9960045anb.22.2011.11.08.19.05.44 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 08 Nov 2011 19:05:45 -0800 (PST) Subject: Re: split into less files References: From: Matt Tucker Content-Type: multipart/alternative; boundary=Apple-Mail-45CAA965-D32C-4307-A759-DE69213033FF X-Mailer: iPhone Mail (9A334) In-Reply-To: Message-Id: Date: Tue, 8 Nov 2011 22:05:36 -0500 To: "user@hive.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) --Apple-Mail-45CAA965-D32C-4307-A759-DE69213033FF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii It sounds like you want to look at setting hive.merge.mapredfiles to true in= your hive-site.xml. Just be aware that it will likely add another map step to your jobs to conso= lidate the files. Matt Tucker On Nov 8, 2011, at 6:19 PM, Shouguo Li wrote: > i think that has to do with your configured block size, check what's your v= alue for dfs.block.size in /hdfs-site.xml =20 > but just curious, why would number of files matter for your use case? >=20 >=20 > On Fri, Oct 21, 2011 at 1:18 AM, Vikas Srivastava wrote: > Hey All, >=20 >=20 > i have an issue like i got a table having single partition but in that par= tition say around 100 200mb files when i overwrite this into other table it= s make 100 files of 20 mb(compressed) what i want is that it should make onl= y 1 or 2 or 10 file of 200mb or 100mb >=20 >=20 > means after overwrite its should make less no of file as compare to non co= mpressed.=20 >=20 >=20 >=20 >=20 > --=20 > With Regards > Vikas Srivastava >=20 > DWH & Analytics Team > Mob:+91 9560885900 > One97 | Let's get talking ! >=20 >=20 --Apple-Mail-45CAA965-D32C-4307-A759-DE69213033FF Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=utf-8
It sounds like you want to look at setting hive.merge.mapredfiles to true in your hive-site.xml.

Just be aware that it will likely add another map step to your jobs to consolidate the files.

Matt Tucker



On Nov 8, 2011, at 6:19 PM, Shouguo Li <the1plummie@gmail.com> wrote:

i think that has to do with your configured block size, check what's your value for dfs.block.size in /hdfs-site.xml   
but just curious, why would number of files matter for your use case?


On Fri, Oct 21, 2011 at 1:18 AM, Vikas Srivastava <vikas.srivastava@one97.net> wrote:
Hey All,


i have an issue like i got a table having single partition but in that partition say around 100 200mb files  when i overwrite this into other table its make 100 files of 20 mb(compressed) what i want is that it should make only 1 or 2 or 10 file of 200mb or 100mb


means after overwrite its should make less no of file as compare to non compressed.




--
With Regards
Vikas Srivastava

DWH & Analytics Team
Mob:+91 9560885900
One97 | Let's get talking !


--Apple-Mail-45CAA965-D32C-4307-A759-DE69213033FF--