hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Rutman <nrut...@gmail.com>
Subject Re: HDFS without Hadoop: Why?
Date Tue, 01 Feb 2011 03:51:19 GMT
On Mon, Jan 31, 2011 at 6:34 PM, Sean Bigdatafun
<sean.bigdatafun@gmail.com> wrote:
> I feel this is a great discussion, so let's think of HDFS' customers.
> (1) MapReduce --- definitely a perfect fit as Nathan has pointed out
I would add the caveat that this depends on your particular weighting
factors of performance, ease of setup, hardware type, sysadmin
sophistication, failure scenarios, and total cost of ownership.  And
that cost is a non-linear function of scale. It's not true that HDFS
is always the best choice even for MapReduce.

> (2) HBase --- it seems HBase (Bigtable's log structured file) did a great
> job on this. The solution comes out of Google, it must be right.
I think this attitude is a major factor in why people choose HBase
(and HDFS).  But Google also sits at a particular point on the
many-dimensional factor space I alluded to above.  Best for Google
does not mean best for everyone.

> But would
> Google necessarily has chosen this approach in its Bigtable system should
> GFS did not exist in the first place? i.e, can we have alternative 'best'
> approach?
I bet you can guess my answer :)

> Anything else? I do not think HDFS is a good file system choice for
> enterprise applications.
Certainly not for most.

> On Tue, Jan 25, 2011 at 12:37 PM, Nathan Rutman <nrutman@gmail.com> wrote:
>> I have a very general question on the usefulness of HDFS for purposes
>> other than running distributed compute jobs for Hadoop.  Hadoop and HDFS
>> seem very popular these days, but the use of HDFS for other purposes
>> (database backend, records archiving, etc) confuses me, since there are
>> other free distributed filesystems out there (I personally work on Lustre),
>> with significantly better general-purpose performance.
>> So please tell me if I'm wrong about any of this.  Note I've gathered most
>> of my info from documentation rather than reading the source code.
>> As I understand it, HDFS was written specifically for Hadoop compute jobs,
>> with the following design factors in mind:
>> write-once-read-many (worm) access model
>> use commodity hardware with relatively high failures rates (i.e.
>> assumptive failures)
>> long, sequential streaming data access
>> large files
>> hardware/OS agnostic
>> moving computation is cheaper than moving data
>> While appropriate for processing many large-input Hadoop data-processing
>> jobs, there are significant penalties to be paid when trying to use these
>> design factors for more general-purpose storage:
>> Commodity hardware requires data replication for safety.  The HDFS
>> implementation has three penalties: storage redundancy, network loading, and
>> blocking writes.  By default, HDFS blocks are replicated 3x: local,
>> "nearby", and "far away" to minimize the impact of data center catastrophe.
>>  In addition to the obvious 3x cost for storage, the result is that every
>> data block must be written "far away" - exactly the opposite of the "Move
>> Computation to Data" mantra.  Furthermore, these over-network writes are
>> synchronous; the client write blocks until all copies are complete on disk,
>> with the longest latency path of 2 network hops plus a disk write gating the
>> overall write speed.   Note that while this would be disastrous for a
>> general-purpose filesystem, with true WORM usage it may be acceptable to
>> penalize writes this way.
> Facebook seems to have a more cost effective way to do replication, but I am
> not sure about its MapReduce performance -- at the end of the day, there are
> only two 'proper' map slot machines that can host a 'cheap' mapper
> operation.
>> Large block size implies fewer files.  HDFS reaches limits in the tens of
>> millions of files.
>> Large block size wastes space for small file.  The minimum file size is 1
>> block.
>> There is no data caching.  When delivering large contiguous streaming
>> data, this doesn't matter.  But when the read load is random, seeky, or
>> partial, this is a missing high-impact performance feature.
> Yes, can anyone answer this question? -- I want to ask the same question as
> well.

I talked to one of the principal HDFS designers, and he agreed with me
on all these points...

>> In a WORM model, changing a small part of a file requires all the file
>> data to be copied, so e.g. database record modifications would be very
>> expensive.
> Yes, can anyone answer this question?
>> There are no hardlinks, softlinks, or quotas.
... except that one.  HDFS now does softlinks and quotas.

>> HDFS isn't directly mountable, and therefore requires a non-standard API
>> to use.  (FUSE workaround exists.)
>> Java source code is very portable and easy to install, but not very quick.
>> Moving computation is cheaper than moving data.  But the data nonetheless
>> always has to be moved: either read off of a local hard drive or read over
>> the network into the compute node's memory.  It is not necessarily the case
>> that reading a local hard drive is faster than reading a distributed
>> (striped) file over a fast network.  Commodity network (e.g. 1GigE),
>> probably yes.  But a fast (and expensive) network (e.g. 4xDDR Infiniband)
>> can deliver data significantly faster than a local commodity hard drive.
> I agree with this statement: "It is not necessarily the case that reading a
> local hard drive is faster than reading a distributed (striped) file over a
> fast network. ", probably Infiniband as well as 10GigE network. And this is
> why I feel it might not be a good strategy that HBase entirely attach its
> design to HDFS.

I've proved this to my own satisfaction with a simple TestDFSIO
benchmark on HDFS and Lustre.  I posted the results in another thread

>> If I'm missing other points, pro- or con-, I would appreciate hearing
>> them.  Note again I'm not questioning the success of HDFS in achieving those
>> stated design choices, but rather trying to understand HDFS's applicability
>> to other storage domains beyond Hadoop.
>> Thanks for your time.

View raw message