hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: FUSE HDFS significantly slower
Date Tue, 26 Oct 2010 20:42:15 GMT

On Oct 26, 2010, at 1:36 PM, Allen Wittenauer wrote:

> On Oct 26, 2010, at 11:25 AM, Hazem Mahmoud wrote:
>> That raises a question that I am currently looking into and would appreciate any
and all advice people have.
>> We are replacing our current NetApp solution, which has served us well but we have
outgrown it.
>> I am looking at either upgrading to a bigger and meaner NetApp or possibly going
with Hadoop (HDFS and Fuse ).
> 	You'd probably better looking at something like Ceph or Lustre which are meant to be
fully POSIX compliant.  

It's the difference between a freight train and a race car.  NetApp/Lustre are race cars;
Hadoop is closer to a freight train.  If you're moving data to 2000 nodes.

They're two different categories.  It's a mainframe versus a Linux cluster.  If you can transition
your mainframe to a Linux cluster, you probably shouldn't have bought a mainframe in the first

>> I need to mount the "storage solution" (HDFS or SAN) to about 5 or 6 systems. I'm
a little concerned about utilizing HDFS/Fuse for a couple of reasons:
>> 1. Performance of Fuse (how does it compare to an iSCSI SAN solution for example)...i
know, it probably depends on a lot of things, but just generally-speaking or any experiences
anyone has had
> 	FUSE in general (regardless of what you're using with it) is going to be significantly
slower vs. a kernel-level file system.

Slower *per node*, but you can still get impressive throughput when you multiply this to 2000

>> 2. Security/permissions (owner of all files show up as "nobody"
> 	I doubt anyone has spent any time adding security the HDFS FUSE port.  So even though
NetApp's Kerberos stack is pretty crappy (3DES only... seriously?) , you're going to get a
better security model with it.

Actually, unix permissions are there.  If all files show up as "nobody", something has gone
wrong in your install.

Strong security (i.e. Kerberos) is probably untested and I suspect it wouldn't work as-is.

>> Another question: Are there other options for mounting HDFS on these 5 or 6 systems
for pure filesystem access ? (using NFS, etc)
> 	No.  I keep hoping someone builds a pNFS/NFSv4.1 server on top of Hadoop, but alas not

Not yet, but again, we're pretty happy because per-process we use <10MB/s.  So, FUSE is
more than sufficient for our needs.

If you're just at the 5-6 node level, I would seriously think about buying a nice big RAID
server from Oracle, run Solaris's nice NFS implementation, and saving some time.

I've attached our Ganglia network graphs below; the data rates really can add up.


View raw message