hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Wilfong" <kevinwilf...@fb.com>
Subject Re: Review Request: Make compression used between map reduce tasks configurable.
Date Fri, 02 Sep 2011 18:56:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1516/
-----------------------------------------------------------

(Updated 2011-09-02 18:56:14.079373)


Review request for hive and Ning Zhang.


Changes
-------

Made changes as suggested by nzhang.

I made the description of hive.exec.inter.mapred.compression.codec much more detailed, and
added a simple example.

I also set hive.exec.compress.intermediate to default to true, but I let hive.exec.inter.mapred.compression.codec
default to the Hadoop default value, so that the existing unit tests hit my new code path.
 Note that my new unit tests check that if hive.exec.inter.mapred.compression.codec is set
to something other than the Hadoop default value, it is used as intended.

This change required that I update the output of any tests that are affected by the change
to hive.exec.compress.intermediate


Summary
-------

I added a field to MapredWork and MapredLocalWork which indicates whether it is intermediate
or not.  By intermediate, I mean that if the query is an insert, there is at least one other
map reduce task that is guaranteed to happen before the move.  If the query is not an insert,
intermediate applies to them all.  I determine this by defaulting the flag to true, and setting
it to false when the tasks to move the data into a table or file are generated.

If the work for a map reduce task (local or otherwise) is intermediate, then we set the compression
to be used on the output of the reduce to some configured value, the default is LZO.


This addresses bug HIVE-2374.
    https://issues.apache.org/jira/browse/HIVE-2374


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1164667 
  trunk/conf/hive-default.xml 1164667 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1164667 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 1164667 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1164667 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 1164667 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1164667 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1164667 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsIntermediateHook.java PRE-CREATION

  trunk/ql/src/test/queries/clientpositive/intermediate_compression.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/auto_join0.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join10.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join11.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join12.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join13.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join15.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join16.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join18.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join2.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join20.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join21.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join22.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join23.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join24.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join26.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join27.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join28.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join29.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join30.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/auto_join31.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/cluster.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/ctas.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby1.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby10.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby11.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby1_limit.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby1_map_skew.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby2_map_skew.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby3.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby3_map_skew.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby4.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby5.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby6.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby6_map_skew.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby8.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby8_map.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby8_map_skew.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby8_noskew.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/groupby9.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/index_auto_self_join.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/index_bitmap3.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/index_bitmap_auto.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/innerjoin.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/input14_limit.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/input1_limit.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/input25.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/input26.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/input39.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/input3_limit.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/input4_limit.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/insert_into3.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/intermediate_compression.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/join0.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join13.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join15.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join18.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join18_multi_distinct.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join19.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join2.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join20.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join21.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join22.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join23.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join29.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join30.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join31.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join32.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join33.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join35.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join38.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join40.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join_hive_626.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join_reorder.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join_reorder2.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/join_reorder3.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/lateral_view.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/lineage1.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/mapjoin_distinct.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/mapjoin_subquery.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/merge4.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/multi_insert.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/multigroupby_singlemr.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/no_hooks.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/nullgroup.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/nullgroup2.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/nullgroup4.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/parallel.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/pcr.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/ppd_gby2.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/ppd_gby_join.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/ppd_join2.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/ppd_repeated_alias.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/ppd_udf_case.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/regex_col.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/sample8.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/semijoin.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/skewjoin.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/stats1.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/udf_case_column_pruning.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/udf_explode.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/udtf_explode.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/udtf_json_tuple.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union10.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union11.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union12.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union14.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union15.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union17.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union18.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union19.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union20.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union22.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union3.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union4.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union5.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union6.q.out 1164667 
  trunk/ql/src/test/results/clientpositive/union7.q.out 1164667 

Diff: https://reviews.apache.org/r/1516/diff


Testing
-------

I added a test query and hook to verify that the is intermediate flag is set properly in the
MapredWork/MapredLocalWork.

I also added a test to TestExecDriver which checks that the correct compression is used on
the output of the reduce for each value of the is intermediate flag.


Thanks,

Kevin


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message