incubator-hcatalog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HCATALOG-580) Optimizations in HCAT-538 break e2e tests
Date Mon, 31 Dec 2012 20:08:12 GMT

    [ https://issues.apache.org/jira/browse/HCATALOG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541482#comment-13541482
] 

Daniel Dai commented on HCATALOG-580:
-------------------------------------

HCATALOG-580-1.patch fix Pig_Check_7, but not fix Pig_Checkin_1. HCATALOG-580-2.patch fix
both Pig_Checkin_1 & Pig_Checkin_7(and all tests pass).

Both HCATALOG-580 & HCATALOG-584 fix all tests. HCATALOG-580 fix the tests by fixing the
logic 538 introduce, HCATALOG-584 fix the tests by solve the cause of the failures. We need
to commit both, 580 for fix, 584 for bullet proof.

Here is more details about issue 538 introduce and how 580, 584 fixing the issue:
1. 538 optimize nn usage by moving partition directory instead of leaf file
2. 538 find the partition directory by assuming the first child of that directory is a file,
which is wrong (can be _temporary, _logs)
3. 538 move the partition directory out, assuming it is two level deep (_TEMP/partition),
which is wrong for _temporary, _logs, which raise exception for _temporary, _logs for directory
not exist
4. 584 solve the issue by create the directory before move partition out (fs.rename), 580
solve the issue by fixing 538 logic
5. 584 make sure rename succeed, but lose some optimization 538 introduce (the fold containing
_temporary will not be treated as partition folder, and will not move the partition as a whole)
                
> Optimizations in HCAT-538 break e2e tests
> -----------------------------------------
>
>                 Key: HCATALOG-580
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-580
>             Project: HCatalog
>          Issue Type: Bug
>    Affects Versions: 0.5
>         Environment: RH 5.8 (on AWS)
> Hadoop 1.1.2.17 (build)
> HCat 0.5 (build)
>            Reporter: Sushanth Sowmyan
>            Assignee: Daniel Dai
>            Priority: Blocker
>             Fix For: 0.5
>
>         Attachments: HCATALOG-580-1.patch, HCATALOG-580-2.patch
>
>
> The optimizations brought in by HCATALOG-538 break dynamic partitioning in the e2e tests.
The issue is that the assumption that if the first child in a directory structure is a directory,
the rest are directories, and if the first child is a file, then the rest are files is an
incorrect one.
> (Admittedly, one part of that, that of assuming that if the first child is a file, the
assumption that it is a leaf directory is not necessarily a bad one in premise, although still
incorrect)
> The issue with this is that underlying FileOutputCommitter and OutputFormat behaviour
would affect whether or not you get files or directories, or whether there would be any _temporary
directories still left behind, for eg.
> In the case I tested, the issue is that there is a _temporary directory in a "leaf" directory,
followed by part files. The optimization sees the _temporary directory, finds nothing inside
it, so doesn't mkdir any parent, then decides that the rest are directories, then moves to
the part file, and tries to rename it directly without mkdir-ing its parent directory.
> The e2e test conf in question is Pig_Checkin_7
> {code}
>                 {
>                                  'num' => 7
>                                 ,'hcat_prep'=>q\drop table if exists pig_checkin_7;
> create table pig_checkin_7 (name string, age int) partitioned by (ds string) STORED AS
TEXTFILE;\
>                                 ,'pig' => q\a = load 'studentparttab30k' using org.apache.hcatalog.pig.HCatLoader();
> b = foreach a generate name, age, ds;
> store b into 'pig_checkin_7' using org.apache.hcatalog.pig.HCatStorer();\,
>                                 ,'result_table' => 'pig_checkin_7',
>                                 ,'sql'   => "select name, age, ds from studentparttab30k;",
>                                 ,'floatpostprocess' => 1
>                                 ,'delimiter' => '       '
>                 }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message