hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-318) [Hive] union all queries broken - all kinds of problems
Date Tue, 03 Mar 2009 18:44:56 GMT
[Hive] union all queries broken - all kinds of problems
-------------------------------------------------------

                 Key: HIVE-318
                 URL: https://issues.apache.org/jira/browse/HIVE-318
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Namit Jain


1. Map-only job : same input
   Hangs because mapper tries to same open twice, and hadoop filesystem complains.

   Fix: Only initialize once - keep state at the Operator level for the same. Should do same
for Close.

2. Map-only job : different inputs
   Loss of data due to rename.

   Fix: change rename to move files to the directory.

3. Map-only job in subquery + RedSink: works currently

4. 2 variables: so 4 sub-cases

   Number of sub-queries having map-reduce jobs. (1/2)
   Operator after Union (RS/FS)
   


a.   Number of sub-queries having map-reduce jobs. 1
     Operator after Union: RS


     Can be done in 2MR - really difficult with current infrastructure.
     Should do with 3 MR jobs now - break on top of UNION. 
     Future optimization: move operators between Union and RS before Union.


b.   Number of sub-queries having map-reduce jobs. 2
     Operator after Union: RS


     Needs 3MR - Should do with 3 MR jobs - break on top of UNION. 
     Future optimization: move operators between Union and RS before Union.


c.   Number of sub-queries having map-reduce jobs. 1
     Operator after Union: FS


     Can be done in 1MR - really difficult with current infrastructure.
     Can be easily done with 2 MR by removing UNION and cloning operators between Union and
FS.
     Should do with 3 MR jobs now - break on top of UNION. 
     Followup optimization: 2MR should be able to handle


d.   Number of sub-queries having map-reduce jobs. 2
     Operator after Union: FS


     Can be easily done with 2 MR by removing UNION and cloning operators between Union and
FS.
     Should do with 3 MR jobs now - break on top of UNION. 
     Followup optimization: 2MR should be able to handle

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message