hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files
Date Wed, 30 Mar 2011 20:18:54 GMT
On Wed, Mar 30, 2011 at 3:31 PM, V.Senthil Kumar <vaisen2000@yahoo.com> wrote:
> Thanks for the suggestion. The query created just one result file.
>
> Also, before trying this query, I have found out another way of making this
> work. I have added the following properties in hive-site.xml and it worked as
> well. It created just one result file.
>
>
> <property>
>  <name>hive.merge.mapredfiles</name>
>  <value>true</value>
>  <description>Merge small files at the end of a map-reduce job</description>
> </property>
>
> <property>
>  <name>hive.input.format</name>
>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>  <description>The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always
> overwrite it - if there is a bug in CombineHiveInputFormat, it can always be
> manually set to HiveInputFormat. </description>
> </property>
>
>
>
> ----- Original Message ----
> From: Jov <zhao6014@gmail.com>
> To: user@hive.apache.org
> Sent: Tue, March 29, 2011 10:22:32 PM
> Subject: Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files
>
> try add limit:
>
> INSERT OVERWRITE LOCAL DIRECTORY
> '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
> Select host, identity, user, time, request
> from raw_apachelog
> where ds = '2011-03-22-001500' limit 32;
>
>
> 2011/3/30 V.Senthil Kumar <vaisen2000@yahoo.com>:
>> Hello,
>>
>> I have a hive query which does a simple select and writes the results to a
>>local
>>
>> file system.
>>
>>
>> For example, a query like this,
>>
>> INSERT OVERWRITE LOCAL DIRECTORY
>> '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
>> Select host, identity, user, time, request
>> from raw_apachelog
>> where ds = '2011-03-22-001500';
>>
>> Now this creates a two files under apachetest folder. This table has only 32
>> rows. Is there any way I can make Hive to create only single file ?
>>
>>
>> Appreciate your help :)
>>
>> Thanks,
>> Senthil
>>
>
>

The number of files is a result of the number of reducers used in the
job. Adding a limit adds a single reducer phase to the job end. You
should be able to accomplish the same thing with 'set
mapred.reduce.tasks=1'

Mime
View raw message