hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Alekseyev <>
Subject Hive produces very small files despite hive.merge...=true settings
Date Thu, 18 Nov 2010 02:00:39 GMT
I have jobs that sample (or generate) a small amount of data from a
large table.  At the end, I get e.g. about 3000 or more files of 1kb
or so.  This becomes a nuisance.  How can I make Hive do another pass
to merge the output?  I have the following settings:


After setting hive.merge* to true, Hive started indicating "Total
MapReduce jobs = 2".  However, after generating the
lots-of-small-files table, Hive says:
Ended Job = job_201011021934_1344
Ended Job = 781771542, job is filtered out (removed at runtime).

Is there a way to force the merge, or am I missing something?

View raw message