mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur Chauhan" <an...@malloc64.com>
Subject Re: Review Request 27483: Fetcher uses hadoop to fetch URIs regardless of the url scheme.
Date Wed, 05 Nov 2014 06:42:04 GMT


> On Nov. 4, 2014, 9:46 p.m., Vinod Kone wrote:
> > src/launcher/fetcher.cpp, line 223
> > <https://reviews.apache.org/r/27483/diff/7/?file=748389#file748389line223>
> >
> >     Why fall through if result.isError()?
> 
> Ankur Chauhan wrote:
>     The way I am thinking of the logic of fetchers is on the lines of "we try all fetchers
one by one till we succeed *even* if an unsuccessful fetcher can handle it". This means that
if for any reason (maybe mesos-fetcher is running doesn't have permissions or whatever be
the case) a fetcher fails we always go to the next one. I understand that a case can be made
to do something on the lines of 
>     
>     ```
>     if(local.canHandle(uri)) {
>       return local.fetch(uri);
>     } else if (os_net.canHandle(uri)) {
>       return os_net.fetch(uri);
>     } else if ( hadoop.canHandle(uri)) {
>       return hadoop.fetch(uri);
>     } else {
>       return Error("Can't download uri:" + uri);
>     }
>     ```
>     
>     But this way we are assuming that local fetcher has all possible permissions to handle
local/file uris *and* os_net has all the configuration to handle all http(s) | ftp/ftp(s)
uris and takes away the fallback. It may be the case that the hadoop fetcher (and hence hadoop
client) has some credentials in its config files that aren't present in the urls themselves.
An example of this is if ftp uris are provided without user or password but core-site.xml
has the required user/password in it.
>     
>     Does this answer your concern?
> 
> Vinod Kone wrote:
>     I'm concerned about how the logs will look for an user when fetching URIs. If it's
a non-local URI they would always see a message and failure with local copy before using HDFS
or Net, which is bad.
>     
>     I would much rather make it explicit what you are doing, with comments.
>     
>     ```
>     if (local.canHandle(uri) {
>       result = local.fetch(uri);
>       if (result.isSome()) {
>         return result;
>       } else {
>         LOG(WARNING) << "Failed to copy local file: " << result.Error() <<
";
>       }
>     } else if (os_net.canHandle(uri)) {
>       result = os_net.fetch(uri);
>       if (result.isSome()) {
>         return result;
>       } else {
>         LOG(WARNING) << "Failed to download URI: " << result.Error() <<
";
>       }
>     }
>     
>     // If we are here, one of the following is possible.
>     // 1. Failed to copy a local URI. 
>     // 2. Failed to download a remote URI using Os.net.
>     // 3. HDFS compatible URI.
>     // 4. Unexpected URI.
>     // For all these cases we try to fetch the URI using Hadoop client as a fallback.
>     
>     return hadoop.fetch(uri);
>     
>     ```
>     
>     More importantly, are there really file permission issues where "cp" doesn't work
but "hadoop fs copyToLocal" works? Can you give me a concrete example? I ask because if the
hadoop client hangs for whatever reason (we have seen this in production at Twitter), it will
take upto the executor registration timeout for the slave to kill the executor and transition
the task to LOST. Getting rid of hadoop dependency was one of the reasons we install our executors
locally. This change adds the dependency back on hadoop, even for local files, which is unfortunate.
> 
> Ankur Chauhan wrote:
>     That does make sense, I hadn't considered the logging :-( and I am all up for no
hard external dependencies if I can avoid it. Let me think a little on that.

The latest patch should take care of this. I am following the same approach as above.


- Ankur


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27483/#review59842
-----------------------------------------------------------


On Nov. 5, 2014, 6:39 a.m., Ankur Chauhan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27483/
> -----------------------------------------------------------
> 
> (Updated Nov. 5, 2014, 6:39 a.m.)
> 
> 
> Review request for mesos, Timothy Chen, Timothy St. Clair, and Vinod Kone.
> 
> 
> Bugs: MESOS-1711
>     https://issues.apache.org/jira/browse/MESOS-1711
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Previously, the fetcher used a hardcoded list of schemes to determine what URIs could
be fetched by hadoop (if available). This is now changed such that we first check if hadoop
can fetch them for us and then we fallback to the os::net and then a local copy method (same
as it used to be). This allows users to fetch artifacts from arbitrary filesystems as long
as hadoop is correctly configured (in core-site.xml).
> 
> 
> Diffs
> -----
> 
>   src/hdfs/hdfs.hpp bbfeddef106c598d8379ced085ef0605c4b2f380 
>   src/launcher/fetcher.cpp 9323c28237010fa065ef34d74435c151ded530a8 
>   src/tests/fetcher_tests.cpp d7754009a59fedb43e3422c56b3a786ce80164aa 
> 
> Diff: https://reviews.apache.org/r/27483/diff/
> 
> 
> Testing
> -------
> 
> make check
> sudo bin/mesos-tests.sh --verbose
> support/mesos-style.py
> 
> 
> Thanks,
> 
> Ankur Chauhan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message