ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Re: Review Request 31878: CopyFromLocal failed to copy Tez tarball to HDFS failed because multiple processes tried to copy to the same destination simultaneously
Date Tue, 10 Mar 2015 18:36:28 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31878/#review75924
-----------------------------------------------------------

Ship it!


I agree that we can let NameNode do the locking here. I don't care if both agents do the same
work and the last one in wins.


ambari-common/src/main/python/resource_management/libraries/functions/dynamic_variable_interpretation.py
<https://reviews.apache.org/r/31878/#comment123252>

    That's a lot of code to do something as simple as 
    
    ```
    unique_string = str(uuid.uuid4())[:8]
    ```
    
    I know we don't need UUID power here, but it's concise and makes the code cleaner.


- Jonathan Hurley


On March 9, 2015, 9:41 p.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31878/
> -----------------------------------------------------------
> 
> (Updated March 9, 2015, 9:41 p.m.)
> 
> 
> Review request for Ambari, Andrew Onischuk, Jonathan Hurley, Nate Cole, and Sid Wagle.
> 
> 
> Bugs: AMBARI-9990
>     https://issues.apache.org/jira/browse/AMBARI-9990
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Pig Service Check and Hive Server 2 START ran on 2 different machines during the stack
installation and failed to copy the tez tarball to HDFS.
> 
> I was able to reproduce this locally by calling CopyFromLocal from two clients simultaneously.
See the HDFS audit log, datanode logs on c6408 & c6410, and namenode log on c6410.
> 
> The copyFromLocal command's behavior is:
> * Try to create a temporary file <filename>._COPYING_ and write the real data there
> * If hit any exception, delete the file with the name <filename>._COPYING_
> 
> Thus we have the following race condition in this test:
> Process P1 created file "tez.tar.gz._COPYING_" and wrote data to it
> Process P2 fired the same copyFromLocal command and hit exception because it could not
get the lease
> P2 then deleted the file "tez.tar.gz._COPYING_"
> P1 could not close the file "tez.tar.gz._COPYING_" since it had been deleted by P2. The
exception would say "could not find lease for file..."
> In general we do not have the correct synchronization guarantee for the "copyFromLocal"
command.
> 
> One solution is for the destination file name to be unique. Because the mv command is
synchronized by the namenode, at least one of them will succeed in naming the file.
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/functions/dynamic_variable_interpretation.py
00b8d70 
> 
> Diff: https://reviews.apache.org/r/31878/diff/
> 
> 
> Testing
> -------
> 
> Unit tests on builds.apache.org passed,
> https://builds.apache.org/job/Ambari-trunk-test-patch/1977/
> 
> I also deployed a cluster and verified that it was able to copy the tarballs to HDFS
when installing YARN, Hive, Pig.
> 
> [root@c6408 ~]# su - hdfs -c 'hadoop fs -ls -R /hdp/apps/2.2.2.0-2538/'
> dr-xr-xr-x   - hdfs hdfs          0 2015-03-10 00:55 /hdp/apps/2.2.2.0-2538/hive
> -r--r--r--   3 hdfs hadoop   82982575 2015-03-10 00:55 /hdp/apps/2.2.2.0-2538/hive/hive.tar.gz
> dr-xr-xr-x   - hdfs hdfs            0 2015-03-10 00:57 /hdp/apps/2.2.2.0-2538/mapreduce
> -r--r--r--   3 hdfs hadoop     105000 2015-03-10 00:57 /hdp/apps/2.2.2.0-2538/mapreduce/hadoop-streaming.jar
> -r--r--r--   3 hdfs hadoop  192699956 2015-03-09 18:15 /hdp/apps/2.2.2.0-2538/mapreduce/mapreduce.tar.gz
> dr-xr-xr-x   - hdfs hdfs            0 2015-03-10 00:56 /hdp/apps/2.2.2.0-2538/pig
> -r--r--r--   3 hdfs hadoop   97542246 2015-03-10 00:56 /hdp/apps/2.2.2.0-2538/pig/pig.tar.gz
> dr-xr-xr-x   - hdfs hdfs            0 2015-03-09 18:15 /hdp/apps/2.2.2.0-2538/tez
> -r--r--r--   3 hdfs hadoop   40656789 2015-03-09 18:15 /hdp/apps/2.2.2.0-2538/tez/tez.tar.gz
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message