hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Kerzner <markkerz...@gmail.com>
Subject Re: Mounting HDFS as local file system
Date Thu, 02 Dec 2010 15:22:14 GMT
Brian,

that almost answers my question. Still, are you saying that the problem of
"Hadoop hates small files" does not exist?

Mark

On Thu, Dec 2, 2010 at 9:02 AM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:

>
> On Dec 2, 2010, at 8:52 AM, Mark Kerzner wrote:
>
> > Thank you, Brian.
> >
> > I found your paper "Using Hadoop as grid storage," and it was very
> useful.
> >
> > One thing I did not understand in it is your file usage pattern - do you
> > deal with small or large files, and do you delete them often enough? My
> > question was, in part, can you use HDFS as a regular file system with
> > frequent file deletes? Does it not become fragmented and unreliable?
> >
>
> We don't have any fragmentation issues.  We frequently delete files (we're
> supposed to be able to turn over 500TB in 2 weeks).  We use quotas and have
> daily monitoring to watch for users who abuse the system.  The only
> directories without quotas are the ones we populate centrally; user
> directories (who we don't control) can quite easily get 1-20TB, but have to
> provide a strong justification to get more than 10k files.
>
> Because HDFS has limited write semantics (but close enough to POSIX read
> semantics) our users love it, but understand it's "special".
>
> It's been a matter of user training:
> - Do you want high performance storage that can do lots of small files?  If
> so, the cost is $X / TB.
> - Do you want high throughput storage where you have limited write
> semantics and need to use large files?  If so, the cost is $Y / TB.
> X is roughly 5-10x Y, so the group leaders can budget appropriately.  We
> then purchase Hadoop and our Other Storage System in appropriate amounts.
>
> User education goes a long way.  However, if they don't want to be bothered
> to be educated, they can always pay more money.
>
> Brian
>
> > Thank you,
> > Mark
> >
> > On Thu, Dec 2, 2010 at 7:10 AM, Brian Bockelman <bbockelm@cse.unl.edu
> >wrote:
> >
> >>
> >> On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote:
> >>
> >>> On 02/12/10 03:01, Mark Kerzner wrote:
> >>>> Hi, guys,
> >>>>
> >>>> I see that there is MountableHDFS<
> >> http://wiki.apache.org/hadoop/MountableHDFS>,
> >>>> and I know that it works, but my questions are as follows:
> >>>>
> >>>>   - How reliable is it for large storage?;
> >>>
> >>> Shouldn't be any worse than normal HDFS operations.
> >>>
> >>>>   - Is it not hiding the regular design questions - we are dealing
> with
> >>>>   NameServers after all, but are trying to use it as a regular file
> >> system?
> >>>>   - For example, HDFS is not optimized for many small files that get
> >>>>   written and deleted, but a mounted system will lure one in this
> >> direction.
> >>>
> >>> Like you say, it's not a conventional posix fs, it hates small files,
> >> where other things may be better.
> >>
> >> I would comment that it's extremely reliable.  There's at least one slow
> >> memory leak in fuse-dfs that I haven't been able to squash, and I
> typically
> >> remount things after a month or two of *heavy* usage.
> >>
> >> Across all the nodes in our cluster, we probably do a few billion HDFS
> >> operations per day over FUSE.
> >>
> >> Brian
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message