hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-968) map join may lead to very large files
Date Thu, 03 Dec 2009 02:47:29 GMT
map join may lead to very large files

                 Key: HIVE-968
                 URL: https://issues.apache.org/jira/browse/HIVE-968
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain

If the table under consideration is a very large file, it may lead to very large files on
the mappers. 
The job may never complete, and the files will never be cleaned from the tmp directory. 
It would be great if the table can be placed in the distributed cache, but minimally the following
should be added:

If the table (source) being joined leads to a very big file, it should just throw an error.
New configuration parameters can be added to limit the number of rows or for the size of the
The mapper should not be tried 4 times, but it should fail immediately.

I cant think of any better way for the mapper to communicate with the client, but for it to
write in some well known
hdfs file - the client can read the file periodically (while polling), and if sees an error
can just kill the job, but with
appropriate error messages indicating to the client why the job died.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message