Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D6928CE2 for ; Thu, 15 Sep 2011 07:30:50 +0000 (UTC) Received: (qmail 57766 invoked by uid 500); 15 Sep 2011 07:30:49 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 57712 invoked by uid 500); 15 Sep 2011 07:30:49 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 57703 invoked by uid 99); 15 Sep 2011 07:30:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2011 07:30:49 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.242.43.252] (HELO smtp2.cybercity.dk) (212.242.43.252) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2011 07:30:42 +0000 Received: from uf6.cybercity.dk (uf6.cybercity.dk [212.242.42.50]) by smtp2.cybercity.dk (Postfix) with ESMTP id A8BD7313C25 for ; Thu, 15 Sep 2011 09:30:19 +0200 (CEST) Received: from per-steffensens-macbook-pro.local (port545.ds1-rd.adsl.cybercity.dk [212.242.185.110]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by uf6.cybercity.dk (Postfix) with ESMTPS id 0182CE37D56 for ; Thu, 15 Sep 2011 09:30:18 +0200 (CEST) Message-ID: <4E71A97B.7040106@designware.dk> Date: Thu, 15 Sep 2011 09:30:03 +0200 From: Per Steffensen User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: hdfs-user@hadoop.apache.org Subject: Re: HDFS vs software RAID like md(adm) References: <4E70EC26.3000903@designware.dk> <4E70F59D.4050709@darose.net> <4E719F80.2000406@designware.dk> In-Reply-To: Content-Type: multipart/alternative; boundary="------------000902070909050706050504" This is a multi-part message in MIME format. --------------000902070909050706050504 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Norman Maurer skrev: > You should keep in mind that HDFS is not POSIX conform so you will > have a hard time to use it as "real fs". I know there is a fuse driver > Guess there is a few solutions http://wiki.apache.org/hadoop/MountableHDFS An alternative would be to write the file-accessing code directly against the HDFS filesystem og perhaps against another VFS (http://en.wikipedia.org/wiki/Virtual_file_system), than what mounting gives us through the FUSE VFS (http://en.wikipedia.org/wiki/Filesystem_in_Userspace) - of course a VFS that has a port to HDFS (e.g. this (https://issues.apache.org/jira/browse/HDFS-1213) port to the Apache Commons VFS (http://commons.apache.org/vfs/)) > for it but I would not use it for heavy usage. Ok, thanks. It will be used for heavy usage. A good cons. > Also HDFS is not really > a good fit for random access at all. > Also a good cons. > If you really need a POSIX fs I would recomment you to have a look at > DRBD or glusterfs.. > Thanks. I will have a look at those. > Bye, > Norman > > > 2011/9/15 Per Steffensen : > >> David Rosenstrauch skrev: >> >>> On 09/14/2011 02:02 PM, Per Steffensen wrote: >>> >>>> Hi >>>> >>>> If my goal is to have multiple physical disks seem as one big disk with >>>> redundancy built in, why would I use a HDFS cluster among machines with >>>> one disk each, instead of using software RAID like md(adm) directly on >>>> top of the disks? I am looking for pros and cons on the two solutions. >>>> http://en.wikipedia.org/wiki/RAID#Software-based_RAID >>>> http://en.wikipedia.org/wiki/Mdadm >>>> >>>> Regards, Per Steffensen >>>> >>> HDFS was never intended to be a general-purpose file system. It is a >>> system optimized for a) running map/reduce, and b) holding large files. It >>> should not be considered as a replacement for RAID. >>> >>> DR >>> >> Thanks for you reply, David. Despite that HDFS wasnt intended to be used for >> this, I guess it could be. So if we forget for a moment that it was not >> designed/optimized to be used as a general purpose file system (GPFS), what >> are the pros and cons for using it as a GPFS with built in redundancy vs >> using software RAID. Is HDFS too slow for some kind of file operations, or >> what will the problems (and benefits) be? Hope for some input - I need >> arguments for and against to be used in a discussion with a customer. >> Thanks! >> >>> >> > > > --------------000902070909050706050504 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Norman Maurer skrev:
You should keep in mind that HDFS is not POSIX conform so you will
have a hard time to use it as "real fs". I know there is a fuse driver
  
Guess there is a few solutions http://wiki.apache.org/hadoop/MountableHDFS
An alternative would be to write the file-accessing code directly against the HDFS filesystem og perhaps against another VFS (http://en.wikipedia.org/wiki/Virtual_file_system), than what mounting gives us through the FUSE VFS (http://en.wikipedia.org/wiki/Filesystem_in_Userspace) - of course a VFS that has a port to HDFS (e.g. this (https://issues.apache.org/jira/browse/HDFS-1213) port to the Apache Commons VFS (http://commons.apache.org/vfs/))
for it but I would not use it for heavy usage.
Ok, thanks. It will be used for heavy usage. A good cons.
 Also HDFS is not really
a good fit for random access at all.
  
Also a good cons.
If you really need a POSIX fs I would recomment you to have a look at
DRBD or glusterfs..
  
Thanks. I will have a look at those.
Bye,
Norman


2011/9/15 Per Steffensen <steff@designware.dk>:
  
David Rosenstrauch skrev:
    
On 09/14/2011 02:02 PM, Per Steffensen wrote:
      
Hi

If my goal is to have multiple physical disks seem as one big disk with
redundancy built in, why would I use a HDFS cluster among machines with
one disk each, instead of using software RAID like md(adm) directly on
top of the disks? I am looking for pros and cons on the two solutions.
http://en.wikipedia.org/wiki/RAID#Software-based_RAID
http://en.wikipedia.org/wiki/Mdadm

Regards, Per Steffensen
        
HDFS was never intended to be a general-purpose file system.  It is a
system optimized for a) running map/reduce, and b) holding large files.  It
should not be considered as a replacement for RAID.

DR
      
Thanks for you reply, David. Despite that HDFS wasnt intended to be used for
this, I guess it could be. So if we forget for a moment that it was not
designed/optimized to be used as a general purpose file system (GPFS), what
are the pros and cons for using it as a GPFS with built in redundancy vs
using software RAID. Is HDFS too slow for some kind of file operations, or
what will the problems (and benefits) be? Hope for some input - I need
arguments for and against to be used in a discussion with a customer.
Thanks!
    
      
    


  

--------------000902070909050706050504--