hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)
Date Fri, 29 Aug 2014 08:57:53 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115037#comment-14115037
] 

Hive QA commented on HIVE-7803:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665172/HIVE-7803.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6127 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-554/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665172

> Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7803
>                 URL: https://issues.apache.org/jira/browse/HIVE-7803
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.13.1
>         Environment: 
>            Reporter: Selina Zhang
>            Assignee: Selina Zhang
>            Priority: Critical
>         Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch
>
>
> One of our users reports they see intermittent failures due to attempt directories in
the input paths. We found with speculative execution turned on, two mappers tried to commit
task at the same time using the same committed task path,  which cause the corrupt output
directory. 
> The original Pig script:
> {code}
> STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
> USING org.apache.hcatalog.pig.HCatStorer();
> {code}
> Two mappers
> attempt_1405021984947_5394024_m_000523_0: KILLED
> attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
> attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
> As a result, it created corrupt directory as 
>   
> /projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
> containing 
>    part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
> and 
>    attempt_1405021984947_5394024_m_000523_1/part-m-00523
> Namenode Audit log
> ==========================
> 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=create 
> src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
>  dst=null  perm=user:group:rw-r-----
> 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  cmd=create 
> src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
>  dst=null  perm=user:group:rw-r-----
> 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=rename 
> src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
> dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
> perm=user:group:rwxr-x---
> 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  cmd=rename 
> src=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
> dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=201408050000/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
> perm=user:group:rwxr-x---
> After consulting our Hadoop core team, we was pointed out some HCat code does not participating
in the two-phase commit protocol, for example in FileRecordWriterContainer.close():
> {code}
>             for (Map.Entry<String, org.apache.hadoop.mapred.OutputCommitter> entry
: baseDynamicCommitters.entrySet()) {
>                 org.apache.hadoop.mapred.TaskAttemptContext currContext = dynamicContexts.get(entry.getKey());
>                 OutputCommitter baseOutputCommitter = entry.getValue();
>                 if (baseOutputCommitter.needsTaskCommit(currContext)) {
>                     baseOutputCommitter.commitTask(currContext);
>                 }
>             }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message