hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: FUSE HDFS significantly slower
Date Tue, 26 Oct 2010 20:42:15 GMT

On Oct 26, 2010, at 1:36 PM, Allen Wittenauer wrote:

> 
> On Oct 26, 2010, at 11:25 AM, Hazem Mahmoud wrote:
> 
>> That raises a question that I am currently looking into and would appreciate any
and all advice people have.
>> 
>> We are replacing our current NetApp solution, which has served us well but we have
outgrown it.
>> 
>> I am looking at either upgrading to a bigger and meaner NetApp or possibly going
with Hadoop (HDFS and Fuse ).
> 
> 	You'd probably better looking at something like Ceph or Lustre which are meant to be
fully POSIX compliant.  
> 

It's the difference between a freight train and a race car.  NetApp/Lustre are race cars;
Hadoop is closer to a freight train.  If you're moving data to 2000 nodes.

They're two different categories.  It's a mainframe versus a Linux cluster.  If you can transition
your mainframe to a Linux cluster, you probably shouldn't have bought a mainframe in the first
place.

>> I need to mount the "storage solution" (HDFS or SAN) to about 5 or 6 systems. I'm
a little concerned about utilizing HDFS/Fuse for a couple of reasons:
>> 1. Performance of Fuse (how does it compare to an iSCSI SAN solution for example)...i
know, it probably depends on a lot of things, but just generally-speaking or any experiences
anyone has had
> 
> 	FUSE in general (regardless of what you're using with it) is going to be significantly
slower vs. a kernel-level file system.
> 

Slower *per node*, but you can still get impressive throughput when you multiply this to 2000
nodes.

> 
>> 2. Security/permissions (owner of all files show up as "nobody"
> 
> 	I doubt anyone has spent any time adding security the HDFS FUSE port.  So even though
NetApp's Kerberos stack is pretty crappy (3DES only... seriously?) , you're going to get a
better security model with it.
> 

Actually, unix permissions are there.  If all files show up as "nobody", something has gone
wrong in your install.

Strong security (i.e. Kerberos) is probably untested and I suspect it wouldn't work as-is.

>> Another question: Are there other options for mounting HDFS on these 5 or 6 systems
for pure filesystem access ? (using NFS, etc)
> 
> 	No.  I keep hoping someone builds a pNFS/NFSv4.1 server on top of Hadoop, but alas not
yet.


Not yet, but again, we're pretty happy because per-process we use <10MB/s.  So, FUSE is
more than sufficient for our needs.

If you're just at the 5-6 node level, I would seriously think about buying a nice big RAID
server from Oracle, run Solaris's nice NFS implementation, and saving some time.

I've attached our Ganglia network graphs below; the data rates really can add up.

Brian


Mime
View raw message