hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Alekseyev <dnqu...@gmail.com>
Subject Hive produces very small files despite hive.merge...=true settings
Date Thu, 18 Nov 2010 02:00:39 GMT
I have jobs that sample (or generate) a small amount of data from a
large table.  At the end, I get e.g. about 3000 or more files of 1kb
or so.  This becomes a nuisance.  How can I make Hive do another pass
to merge the output?  I have the following settings:

hive.merge.mapfiles=true
hive.merge.mapredfiles=true
hive.merge.size.per.task=256000000
hive.merge.size.smallfiles.avgsize=16000000

After setting hive.merge* to true, Hive started indicating "Total
MapReduce jobs = 2".  However, after generating the
lots-of-small-files table, Hive says:
Ended Job = job_201011021934_1344
Ended Job = 781771542, job is filtered out (removed at runtime).

Is there a way to force the merge, or am I missing something?
--Leo

Mime
View raw message