mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Chauhan <an...@malloc64.com>
Subject Why rely on url scheme for fetching?
Date Fri, 31 Oct 2014 22:27:30 GMT
Hi,

I have been looking at some of the stuff around the fetcher and saw something interesting.
The code for fetcher::fetch method is dependent on a hard coded list of url schemes. No doubt
that this works but is very restrictive. 
Hadoop/HDFS in general is pretty flexible when it comes to being able to fetch stuff from
urls and has the ability to fetch a large number of types of urls and can be extended by adding
configuration into the conf/hdfs-site.xml and core-site.xml

What I am proposing is that we refactor the fetcher.cpp to prefer to use the hdfs (using hdfs/hdfs.hpp)
to do all the fetching if HADOOP_HOME is set and $HADOOP_HOME/bin/hadoop is available. This
logic already exists and we can just use it. The fallback logic for using net::download or
local file copy is may be left in place for installations that do not have hadoop configured.
This means that if hadoop is present we can directly fetch urls such as tachyon://... snackfs://
... cfs:// .... ftp://... s3://... http:// ... file:// with no extra effort. This makes up
for a much better experience when it comes to debugging and extensibility.

What do others think about this?

- Ankur
Mime
View raw message