ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-9990) CopyFromLocal failed to copy Tez tarball to HDFS failed because multiple processes tried to copy to the same destination simultaneously
Date Wed, 11 Mar 2015 01:09:38 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356058#comment-14356058
] 

Hadoop QA commented on AMBARI-9990:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12703793/AMBARI-9990.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/Ambari-trunk-test-patch/1995//testReport/
Console output: https://builds.apache.org/job/Ambari-trunk-test-patch/1995//console

This message is automatically generated.

> CopyFromLocal failed to copy Tez tarball to HDFS failed because multiple processes tried
to copy to the same destination simultaneously
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-9990
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9990
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-9990.patch, hadoop-hdfs-datanode-c6408.ambari.apache.org.log,
hadoop-hdfs-datanode-c6410.ambari.apache.org.log, hadoop-hdfs-namenode-c6408.ambari.apache.org.log,
hdfs-audit.log
>
>
> Pig Service Check and Hive Server 2 START ran on 2 different machines during the stack
installation and failed to copy the tez tarball to HDFS.
> I was able to reproduce this locally by calling CopyFromLocal from two clients simultaneously.
See the HDFS audit log, datanode logs on c6408 & c6410, and namenode log on c6410.
> The copyFromLocal command's behavior is:
> * Try to create a temporary file <filename>._COPYING_ and write the real data there
> * If hit any exception, delete the file with the name <filename>._COPYING_
> Thus we have the following race condition in this test:
> Process P1 created file "tez.tar.gz._COPYING_" and wrote data to it
> Process P2 fired the same copyFromLocal command and hit exception because it could not
get the lease
> P2 then deleted the file "tez.tar.gz._COPYING_"
> P1 could not close the file "tez.tar.gz._COPYING_" since it had been deleted by P2. The
exception would say "could not find lease for file..."
> In general we do not have the correct synchronization guarantee for the "copyFromLocal"
command.
> One solution is for the destination file name to be unique. Because the mv command is
synchronized by the namenode, at least one of them will succeed in naming the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message