hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-105) estimate number of required reducers and other map-reduce parameters automatically
Date Wed, 03 Dec 2008 04:20:44 GMT
estimate number of required reducers and other map-reduce parameters automatically
----------------------------------------------------------------------------------

                 Key: HIVE-105
                 URL: https://issues.apache.org/jira/browse/HIVE-105
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


currently users have to specify number of reducers. In a multi-user environment - we generally
ask users to be prudent in selecting number of reducers (since they are long running and block
other users). Also - large number of reducers produce large number of output files - which
puts pressure on namenode resources.

there are other map-reduce parameters - for example the min split size and the proposed use
of combinefileinputformat that are also fairly tricky for the user to determine (since they
depend on map side selectivity and cluster size). This will become totally critical when there
is integration with BI tools since there will be no opportunity to optimize job settings and
there will be a wide variety of jobs.

This jira calls for automating the selection of such parameters - possibly by a best effort
at estimating map side selectivity/output size using sampling and determining such parameters
from there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message