hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Using HDFS as native storage
Date Thu, 27 Mar 2008 20:21:15 GMT

We looked seriously at HDFS and MogileFS and considered (and instantly
rejected a *bunch* of others).

HDFS was eliminated based on number of files, lack of HA and lack of
reference implementations serving large scale web sites directly from it.

Mogile had HA (using crude tools), reference implementations and could
obviously be scaled to pretty large levels.  It also had an implementation
that was simple enough for our guys to fix the defects.

At that point, the time window for further evaluation closed and we had to
make a decision.   

On 3/27/08 11:30 AM, "Robert Krüger" <krueger@signal7.de> wrote:

> might be off-topic but how would you compare GlusterFS to HDFS and
> MogileFS for such an application? Did you look at that at all and
> decided against it?
> Ted Dunning wrote:
>> We evaluated several options for just this problem and eventually settled on
>> MogileFS.  That said, Mogile needed several weeks of work to get it ready
>> for prime time.  It will work pretty well for modest sized collections, but
>> for our stuff (many hundreds of millions of files, approaching PB of
>> storage), it just wasn't ready.  The fixes had to do with sharding the name
>> database across many mySQL instances and improving the handling of storage
>> system up-state.
>> On 3/27/08 2:13 AM, "Robert Krüger" <krueger@signal7.de> wrote:
>>> Hi,
>>> we're looking for options for creating a scalable storage solution based
>>> on commodity hardware for media files (spacewise dominated video files
>>> of a few hundred MB but also to store up to a few million smaller files
>>> such as thumbnails). The system will start with a few TB and should be
>>> able to scale to about a PB.
>>> Is anyone using HDFS for native storage for critical files or is it just
>>> common to use HDFS for large amounts of temporary more or less
>>> non-critical data? What would be the trade-offs to decide whether to use
>>> HDFS or something like GlusterFS? Note that we'r ecurrently not planning
>>> on using MapReduce.
>>> Thanks in advance,
>>> Robert

View raw message