hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Wang <chen.apache.s...@gmail.com>
Subject Re: Hadoop streaming with insert dynamic partition generate many small files
Date Mon, 03 Feb 2014 06:55:48 GMT
 it seems that hive.exec.reducers.bytes.per.reducer is still not big
enough: I added another 0, and now i only gets one file under each
partition.


On Sun, Feb 2, 2014 at 10:14 PM, Chen Wang <chen.apache.solr@gmail.com>wrote:

> Hi,
> I am using java reducer reading from a table, and then write to another
> one:
>
>   FROM (
>
>                 FROM (
>
>                     SELECT column1,...
>
>                     FROM table1
>
>                     WHERE  ( partition > 6 and partition < 12 )
>
>                 ) A
>
>                 MAP A.column1,A....
>
>                 USING 'java -cp .my.jar  mymapper.mymapper'
>
>                 AS key, value
>
>                 CLUSTER BY key
>
>             ) map_output
>
>             INSERT OVERWRITE TABLE target_table PARTITION(partition)
>
>             REDUCE
>
>                 map_output.key,
>
>                 map_output.value
>
>             USING 'java -cp .:myjar.jar  myreducer.myreducer'
>
>             AS column1,column2;"
>
> Its all working fine, except that there are many (20-30) small files
> generated under each partition. i am setting  SET
> hive.exec.reducers.bytes.per.reducer=1280,000,000; hoping to get one big
> enough file under for each partition.But it does not seem to have any
> effect. I still get 20-30 small files under each folder, and each file size
> is around 7kb.
>
> How can I force to generate only 1 big file for one partition? Does this
> have anything to do with the streaming? I recall in the past i was directly
> reading from a table with UDF, and write to another table, it only
> generates one big file for the target partition. Not sure why is that.
>
>
> Any help appreciated.
>
> Thanks,
>
> Chen
>
>
>
>

Mime
View raw message