hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Re: Hadoop Distributed File System requirements on Wiki
Date Thu, 06 Jul 2006 20:56:01 GMT
Eric,

Thanks - response embedded below.

One more suggestion:  store a copy of the per-block metadata on the
datanode. it doesnt have to have an updated copy of the filename, just
the "original file name" and block offset would be fine. since you're
adding truncation features, you'd want some kind of truncation
generation number too. this would make possible a distributed namenode
recovery, which is belt-and-suspenders valuable even after adding
checkpointing features to the namenode. storing this metadata is more
important than writing the recovery program, since the recovery
program could be written after the disaster that makes it necessary.
(just a suggestion).

On 7/6/06, Eric Baldeschwieler <eric14@yahoo-inc.com> wrote:
>
> On Jul 6, 2006, at 12:02 PM, Paul Sutter wrote:
>
> ...
> > *Constant size file blocks (#16),  -1*
> >
> > I vote to keep variable size blocks, especially because you are adding
> > atomic append capabilities (#25). Variable length blocks creates the
> > possibility for blocks that contain only whole records. This:
> > - improves recoverability for large important files with one or more
> > irrevocably lost blocks, and
> > - makes it very clean for mappers to process local data blocks
>
> ...  I think we can achieve our goal without compromising yours.
> Each block can be of any size up to the files fixed block size.  The
> system can be aware of that and provide an API to report gaps and/or
> an API option to skip them or see them as NULLs.  This reporting can
> be done at the datanode level allowing us to remove all the size data
> & logic at the namenode level.
>
> ** If you agree, why don't we just add the above annotation to
> konstantine's doc?

Wow! Good idea, and now I see why you wanted to make the change in the
first place. I agree, please go ahead and add.

Incidently, its probably fine if
- the API just skipped the ghost bytes,
- programs using such files should only ever seek to locations that
had been returned by getPos(), and
- getPos() should return the byte offset of the next block as soon as
a ghost byte is reached.

I think existing programs will work fine within these restrictions.
The last one is intended for code like SequenceFile that checks
current position against file length when reading data.
(SequenceFiles' syncing code might have to get reconsidered, but would
be easier since you'd just advance to the next block on a checksum
failure).

> > *Recoverability and Availability Goals*
> ...
> > **
> > *Backup Scheme*
> > **
> > We might want to start discussion of a backup scheme for HDFS,
> > especially
> > given all the courageous rewriting and feature-addition likely to
> > occur.
>
> ** I agree, this needs to be on the list.  I'm imagining a command
> that hardlinks every datanode's (and namenode's if needed) files into
> a snapshot directory.  And another command that moves all current
> state into a snapshot directory and hardlinks a snapshot's state back
> into the working directory.  This would be very fast and not cost
> much space in the short term.  Thoughts?  (yes, hardlinks are a pain
> on the PC, we can discuss design later)

This is a fantastic idea.

But as for covering my fears, I'll feel safer with key data backed up
in a filesystem that is not DFS, as pedestrian as that sounds. :)

> > *Rebalancing (#22,#21)*
> >
> > I would suggest that keeping disk usage balanced is more than a
> > performance
> > feature, its important for the success of running jobs with large map
> > outputs or large sorts. Our most common reducer failure is running
> > out of
> > disk space during sort, and this is caused by imbalanced block
> > allocation.
>
> ** Good point.  Any interest in helping us with this one?

We'll take a look at it.

> >
> > On 6/30/06, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
> >>
> >> I've created a Wiki page that summarizes DFS requirements and
> >> proposed
> >> changes.
> >> This is a summary of discussions held in this mailing list and
> >> additional internal discussions.
> >> The page is here:
> >>
> >> http://wiki.apache.org/lucene-hadoop/DFS_requirements
> >>
> >> I see there is an ongoing related discussion in HADOOP-337.
> >> We prioritized our goals as
> >> (1) Reliability (which includes Recoverability and Availability)
> >> (2) Scalability
> >> (3) Functionality
> >> (4) Performance
> >> (5) other
> >> But then gave higher priority to some features like the append
> >> functionality.
> >>
> >> Happy holidays to everybody.
> >>
> >> --Konstantin Shvachko
> >>
>
>

Mime
View raw message