hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-102) Dont copy to DFS if source filesystem marked as shared
Date Mon, 10 Mar 2008 18:06:46 GMT

    [ https://issues.apache.org/jira/browse/PIG-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577100#action_12577100
] 

Benjamin Reed commented on PIG-102:
-----------------------------------

PigInputFormat doesn't look for any special prefixes. It assumes that all inputs are in HDFS.
PigOutputFormat has the same issue. So, these classes are going to need to change to do what
they do now in the absence of special prefixes and use local filesystem for shared: prefix.

> Dont copy to DFS if source filesystem marked as shared
> ------------------------------------------------------
>
>                 Key: PIG-102
>                 URL: https://issues.apache.org/jira/browse/PIG-102
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>         Environment: Installations with shared folders on all nodes (eg NFS)
>            Reporter: Craig Macdonald
>         Attachments: shared.patch
>
>
> I've been playing with Pig using three setups:
> (a) local
> (b) hadoop mapred with hdfs
> (c) hadoop mapred with file:///path/to/shared/fs as the default file system
> In our local setup, various NFS filesystems are shared between all machines (including
mapred nodes)  eg /users, /local
> I would like Pig to note when input files are in a file:// directory that has been marked
as shared, and hence not copy it to DFS.
> Similarly, the Torque PBS resource manager has a usecp directive, which notes when a
filesystem location is shared between all nodes, (and hence scp is not needed, cp alone can
be used). See http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_networked_filesystems
> It would be good to have a configurable setting in Pig that says when a filesystem is
shared, and hence no copying between file:// and hdfs:// is needed.
> An example in our setup might be:
> sharedFS file:///local/
> sharedFS file:///users/
> if commands should be used.
> This command should be used with care. Obviously if you have 1000 nodes all accessing
a shared file in NFS, then it would have been better to "hadoopify" the file.
> The likely area of code to patch is src/org/apache/pig/impl/io/FileLocalizer.java hadoopify(String,
PigContext)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message