hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Using Hadoop with NFS mounted file server
Date Mon, 14 Aug 2006 20:29:07 GMT
You don't want to use DFS on top of NFS.  If you use DFS, keep its data 
on the local drives, not in NFS.  If you want to use NFS for shared 
data, then simply don't use DFS: specify "local" as the filesystem and 
don't start datanodes or a namenode.

I think you'll find DFS will perform better than NFS for crawling, 
indexing, etc.  If you like, at the end, you could copy the final index 
from DFS onto your NFS server, if that's where you'd prefer to have it.

Does that help?


Adam Taylor wrote:
> Hello, I've started to do some initial test runs with Hadoop 0.4.0, Nutch
> 0.8 and Nutchwax 0.6+.   My setup includes several rack mount servers that
> will be used for distributed indexing and a clustered file server that is
> NFS mounted on each server.  I would like for all of the hadoop slaves to
> write the index to the file server (instead of to local disk).
> I am curious, if the Hadoop master and its slaves will be accessing the 
> same
> file server to store the index, will it be possible to run the index in
> distributed mode but specify "local" for the file system? I have tried 
> doing
> it this way and couldn't get it to work. It seems that all documentation 
> for
> Hadoop suggests using distributed mode for both the file system and the
> indexing. However, if I try with a distributed file system with my setup,
> each slave is writing to the same file server so we get a conflict: "Cannot
> start multiple Datanode instances sharing the same data directory"
> Thanks!
> Adam

View raw message