hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1516) optimize split sizes automatically taking into account amount of nature of map tasks
Date Fri, 06 Aug 2010 19:23:17 GMT
optimize split sizes automatically taking into account amount of nature of map tasks
------------------------------------------------------------------------------------

                 Key: HIVE-1516
                 URL: https://issues.apache.org/jira/browse/HIVE-1516
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


two immediate cases come to mind:
- pure filter job (ie. no map-side sort required)
- full aggregate computations only (like count(1)).

in these cases - the amount of data to be sorted is zero or negligible. so mapper parallelism
(and split size) should be dictated by the size of the cluster. there's no point running 10000
mappers on a 500 node cluster for a pure filter job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message