hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1599) optimize mapjoin to use distributedcache
Date Thu, 26 Aug 2010 01:29:16 GMT
optimize mapjoin to use distributedcache

                 Key: HIVE-1599
                 URL: https://issues.apache.org/jira/browse/HIVE-1599
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain
             Fix For: 0.7.0

Currently, each mapper reads the file locally in case of a mapjoin. This creates problems
if the number
of mappers is very high.

It would be optimal to put the files in the distributedcache before the job starts, and then
the mappers
can read it from the cache instead of reading from hdfs as they do currently.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message