hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Optimal Filesystem (and Settings) for HDFS
Date Wed, 20 May 2009 02:46:47 GMT
I always disable atime and it's ilk
The deadline scheduler helps with the (non xfs hanging) du datanode timeout
issues, but not much.

Ultimately that is a caching failure in the kernel, due to the hadoop io
patterns.

Anshu, any luck getting off the PAE kernels? Is this the xfs lockup, or just
the du taking to long?

At one point, sagar and I talked about replacing the du call with a script
that used the df as a rapid and close proxy, to get rid of the du calls, the
block report was another problem

On Tue, May 19, 2009 at 3:59 PM, Anshuman Sachdeva <asachdeva@attributor.com
> wrote:

> Hi Brian,
>         thanks for the mail. I have an issue when we use xfs. hadoop runs
> du -sk after every 10 min on my cluster and some times it goes in the loop
> and machine hangs. Have you seen this issue or its only me?
>
> I'll really appreciate if some one can put some light on this
>
>
> Anshuman
> ----- Original Message -----
> From: "Bryan Duxbury" <bryan@rapleaf.com>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, May 19, 2009 2:50:57 PM GMT -08:00 US/Canada Pacific
> Subject: Re: Optimal Filesystem (and Settings) for HDFS
>
> We use XFS for our data drives, and we've had somewhat mixed results.
> One of the biggest pros is that XFS has more free space than ext3,
> even with the reserved space settings turned all the way to 0.
> Another is that you can format a 1TB drive as XFS in about 0 seconds,
> versus minutes for ext3. This makes it really fast to kickstart our
> worker nodes.
>
> We have seen some weird stuff happen though when machines run out of
> memory, apparently because the XFS driver does something odd with
> kernel memory. When this happens, we end up having to do some fscking
> before we can get that node back online.
>
> As far as outright performance, I actually *did* do some tests of xfs
> vs ext3 performance on our cluster. If you just look at a single
> machine's local disk speed, you can write and read noticeably faster
> when using XFS instead of ext3. However, the reality is that this
> extra disk performance won't have much of an effect on your overall
> job completion performance, since you will find yourself network
> bottlenecked well in advance of even ext3's performance.
>
> The long and short of it is that we use XFS to speed up our new
> machine deployment, and that's it.
>
> -Bryan
>
> On May 18, 2009, at 10:31 AM, Alex Loddengaard wrote:
>
> > I believe Yahoo! uses ext3, though I know other people have said
> > that XFS
> > has performed better in various benchmarks.  We use ext3, though we
> > haven't
> > done any benchmarks to prove its worth.
> >
> > This question has come up a lot, so I think it'd be worth doing a
> > benchmark
> > and writing up the results.  I haven't been able to find a detailed
> > analysis
> > / benchmark writeup comparing various filesystems, unfortunately.
> >
> > Hope this helps,
> >
> > Alex
> >
> > On Mon, May 18, 2009 at 8:54 AM, Bob Schulze
> > <b.schulze@ecircle.com> wrote:
> >
> >> We are currently rebuilding our cluster - has anybody
> >> recommendations on
> >> the underlaying file system? Just standard Ext3?
> >>
> >> I could imagine that the block size could be larger than its
> >> default...
> >>
> >> Thx for any tips,
> >>
> >>        Bob
> >>
> >>
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message