hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad Chakka (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-318) [Hive] union all queries broken - all kinds of problems
Date Tue, 10 Mar 2009 18:26:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680570#action_12680570
] 

Prasad Chakka commented on HIVE-318:
------------------------------------

ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:325
 why are you catching an exception that is never thrown?

ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:67
 shouldn't you throw an error if the file exists instead of asserting?
 convert some of the the info log messages to error

ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:134, 153
 remove the System.out

ql/src/java/org/apache/hadoop/hive/ql/lib/DefaultRuleDispatcher.java:80
 can you make stack to be 3rd parameter to the process function since most process function
don't use this param

ql/src/java/org/apache/hadoop/hive/ql/plan/fileSinkDesc.java:35 &47
 javadoc?

ql/src/java/org/apache/hadoop/hive/ql/plan/unionDesc.java
 empty class? add a comment

unit tests:
can you add comments so that which unit test so that it is easy to figure out what case of
union it is testing

will add more comments later..





> [Hive] union all queries broken - all kinds of problems
> -------------------------------------------------------
>
>                 Key: HIVE-318
>                 URL: https://issues.apache.org/jira/browse/HIVE-318
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.318.patch
>
>
> 1. Map-only job : same input
>    Hangs because mapper tries to same open twice, and hadoop filesystem complains.
>    Fix: Only initialize once - keep state at the Operator level for the same. Should
do same for Close.
> 2. Map-only job : different inputs
>    Loss of data due to rename.
>    Fix: change rename to move files to the directory.
> 3. Map-only job in subquery + RedSink: works currently
> 4. 2 variables: so 4 sub-cases
>    Number of sub-queries having map-reduce jobs. (1/2)
>    Operator after Union (RS/FS)
>    
> a.   Number of sub-queries having map-reduce jobs. 1
>      Operator after Union: RS
>      Can be done in 2MR - really difficult with current infrastructure.
>      Should do with 3 MR jobs now - break on top of UNION. 
>      Future optimization: move operators between Union and RS before Union.
> b.   Number of sub-queries having map-reduce jobs. 2
>      Operator after Union: RS
>      Needs 3MR - Should do with 3 MR jobs - break on top of UNION. 
>      Future optimization: move operators between Union and RS before Union.
> c.   Number of sub-queries having map-reduce jobs. 1
>      Operator after Union: FS
>      Can be done in 1MR - really difficult with current infrastructure.
>      Can be easily done with 2 MR by removing UNION and cloning operators between Union
and FS.
>      Should do with 3 MR jobs now - break on top of UNION. 
>      Followup optimization: 2MR should be able to handle
> d.   Number of sub-queries having map-reduce jobs. 2
>      Operator after Union: FS
>      Can be easily done with 2 MR by removing UNION and cloning operators between Union
and FS.
>      Should do with 3 MR jobs now - break on top of UNION. 
>      Followup optimization: 2MR should be able to handle

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message