hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1780) reduce need to rewrite fsimage on statrtup
Date Thu, 24 Mar 2011 16:54:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010762#comment-13010762
] 

Matt Foley commented on HDFS-1780:
----------------------------------

>> Writing image is a (small) fraction of the other start up components. You can find
the startup 
>> timeline numbers in other jiras. What is the high level problem you are solving?

With other improvements underway, large cluster startup time is down to about 30 minutes.
 Of this, 5 minutes is writing the new FSImage files, even after the improvements of HDFS-1071.
 So this has become a significant, if not huge, part of the startup time.

>> For those who have been running hadoop for 4+ years, Namenode being able to write
back updated 
>> fsimage back saved us during upgrades. Please don't remove this completely, make
it optional.

Completely agree that backup copies of this info are vital.  However:

(1) Since the Edits files are also replicated, it is reasonable to think that having a matched
set of FSImage & Edits is sufficient for this protection; it is not vital to have them
compacted into an updated FSImage.  I believe the proposal is not to eliminate redundant backups,
the proposal is simply to not view the compacting operation as something vital to do at startup
time.

(2) Since most of us running production clusters use Checkpoint Namenodes to do the compacting
operation (combining the FSImage + Edits => new FSImage, and writing out redundant copies
of the new FSImage) in background, in order to keep the size of the Edits logs under control,
it is even less important to do a compaction operation during startup.  In fact, it seems
to me that only sites that do NOT use any sort of Checkpoint Namenode actually have any need
to do compaction from the Primary Namenode, at startup or otherwise.

So I think Daryn's suggestion is worthwhile.  Doing the compaction at startup should still
remain an option, for sites not using Checkpoint Namenodes.

> reduce need to rewrite fsimage on statrtup
> ------------------------------------------
>
>                 Key: HDFS-1780
>                 URL: https://issues.apache.org/jira/browse/HDFS-1780
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Daryn Sharp
>
> On startup, the namenode will read the fs image, apply edits, then rewrite the fs image.
 This requires a non-trivial amount of time for very large directory structures.  Perhaps
the namenode should employ some logic to decide that the edits are simple enough that it doesn't
warrant rewriting the image back out to disk.
> A few ideas:
> Use the size of the edit logs, if the size is below a threshold, assume it's cheaper
to reprocess the edit log instead of writing the image back out.
> Time the processing of the edits and if the time is below a defined threshold, the image
isn't rewritten.
> Timing the reading of the image, and the processing of the edits.  Base the decision
on the time it would take to write the image (a multiplier is applied to the read time?) versus
the time it would take to reprocess the edits.  If a certain threshold (perhaps percentage
or expected time to rewrite) is exceeded, rewrite the image.
> Somethingalong the lines of the last suggestion may allow for defaults that adapt for
any size cluster, thus eliminating the need to keep tweaking a cluster's settings based on
its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message