hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Loddengaard" <a...@cloudera.com>
Subject Re: Using hadoop as storage cluster?
Date Sat, 25 Oct 2008 08:30:37 GMT
I don't think HDFS would be the ideal DFS for you.  The amount of meta data
associated with even small files is large, so you would most likely bog down
your namenode.  HDFS is meant for large files.  Take a look at Wikipedia for
more DFS implementations:

<http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_systems>

I don't know much about other DFS options, though, so maybe someone else
could say more?

Alex

On Fri, Oct 24, 2008 at 5:42 PM, David C. Kerber <
dkerber@warrenrogersassociates.com> wrote:

> Are there any tuning settings that can be adjusted to optimize for files of
> a given size range?
>
> There would be quite a few files in the 100kB to 2MB range, which are
> received and processed daily, with smaller numbers ranging up to ~600MB or
> so which are summarizations of many of the daily data files, and maybe a
> handful in the 1GB -  6GB range (disk images and database backups, mostly).
>  There would also be a few (comparatively few, that is) configuration files
> of a few kB each.
>
> Thanks for the response; do you know of any other systems with similar
> functionality?
>
> Dave
>
>
>
> > -----Original Message-----
> > From: Alex Loddengaard [mailto:alex@cloudera.com]
> > Sent: Friday, October 24, 2008 5:42 PM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Using hadoop as storage cluster?
> >
> > What files do you expect to be storing?  Generally speaking,
> > HDFS (Hadoop's distributed file system) does not handle small
> > files very efficiently.
> > Instead it's optimized for large files, upwards of 64MB each.
> >
> > Alex
> >
> > On Fri, Oct 24, 2008 at 9:41 AM, David C. Kerber <
> > dkerber@warrenrogersassociates.com> wrote:
> >
> > > Hi -
> > >
> > > I'm a complete newbie to hadoop, and am wondering if it's
> > appropriate
> > > for configuring a bunch of older machines that have no
> > other use, for
> > > use as a storage cluster on an otherwise windows network,
> > so that my
> > > windows clients see their combined disk space as a single
> > large share?
> > >
> > > If so, will I need additional software to let the windows
> > clients see
> > > them (like Samba does for a single machine)?  We don't have
> > a lot of
> > > linux experience in our office, but probably enough to get
> > this going
> > > if it's not too complex; mostly with Ubuntu and Fedora.
> > >
> > > If hadoop isn't well-suited to this use, or there is
> > something better,
> > > I'm open to suggestions....
> > >
> > > Thanks
> > > ----------------------------------
> > > David Kerber
> > > Warren Rogers Associates
> > > (800)-972-7472 x-111
> > > dkerber@WarrenRogersAssociates.com
> > > www.WarrenRogersAssociates.com
> > > ----------------------------------
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message