hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: dont copy to DFS if source filesystem marked as shared
Date Fri, 08 Feb 2008 19:22:04 GMT
Great suggestion Craig! Could you open a Jira on this?

thanx
ben

On Friday 08 February 2008 01:26:11 Craig Macdonald wrote:
> Good morning,
>
> I've been playing with Pig using three setups:
>  (a) local
>  (b) hadoop mapred with hdfs
>  (c) hadoop mapred with file:///path/to/shared/fs as the default file
> system
>
> In our local setup, various NFS filesystems are shared between all
> machines (including mapred nodes)  eg /users, /local
>
> I would like Pig to note when input files are in a file:// directory
> that has been marked as shared, and hence not copy it to DFS.
>
> For comparison, the Torque PBS resource manager has a usecp directive,
> which notes when a filesystem location is shared between all nodes, (and
> hence scp is not needed). See
> http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_n
>etworked_filesystems
>
> It would be good to have a configurable setting in Pig that says when a
> filesystem is shared, and hence no copying between file:// and hdfs://
> is needed.
> An example in our setup might be:
>  sharedFS file:///local/
>  sharedFS file:///users/
> if commands should be used.
>
> Relatedly, if I use a fs.default.name=file:///path/to/shared/fs then the
> default file path for Pig job information is not suitable (eg
> /tmp/tempRANDOMINT is NOT shared on all nodes)
>
> C



Mime
View raw message