hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Hadoop Distributed File System requirements on Wiki
Date Sat, 15 Jul 2006 01:53:06 GMT
I have updated the DFS requirements wiki page to reflect the discussion 
that took place.


The changes are:
 - two new requirements on recovery and availability;
 - clarification for item #16;
 - and 3 additional tasks that need to be prioritized.

What people think about priorities of these three and other tasks?



Paul Sutter wrote:

> thanks! comments below...
> On 7/7/06, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
>> Paul Sutter wrote:
>> > One more suggestion:  store a copy of the per-block metadata on the
>> > datanode. ..
>> Being able to reconstruct the system even if the checkpoint is lost
>> forever is
>> a nice feature to have. The "original file name" can be placed into the
>> crc file (#15)...
> Great place to put it, very nice.
>> > *Recoverability and Availability Goals*
>> >
>> > You might want to consider adding recoverability and availability 
>> goals.
>> This is an interesting observation. Ideally, we would like to save and
>> replicate
>> fs image file as soon as the edits file reaches a specific size, and we
>> would like
>> to make edits file updates transactional, with the file system locked
>> for updates
>> during the transaction. This would be the zero recoverability goal in
>> your terms.
>> Are we willing to weaken this requirement in favor of the performance?
> Actually its OK for me if we lose even an hour of data on a namenode
> crash, since I can just resubmit the recent jobs. Less loss is better,
> but my suggestion would be to favor simplicity over absolute recovery
> if thats a tradeoff. Others might feel differently about acceptable
> levels of data loss.
>> > Availability goals are probably less stringent than for most storage
>> > systems
>> > (dare I say that a few hours downtime is probably OK) Adding these
>> > goals to
>> > the document could be valuable for consensus and prioritization.
>> If I understood you correctly, this goal is more related to a specific
>> installation of
>> the system rather than to the system itself as a software product.
>> Or do you mean that the total time spent by the system on 
>> self-maintenance
>> procedures like backups and checkpointing should not exceed 2 hours a 
>> day?
>> In any case, I agree, high availability should be mentioned, probably 
>> in the
>> "Feature requirements" section.
> Its about features. Is namenode failover automatic or manual? If its
> manual, it takes time. And it should definitely be manual for now.
> Seamless namenode failover done right is a lot of work, and
> unnecessary.
> With manual failover, what is the downtime when a namenode fails?
> Well, I imagine that you'd want to take everything down, bring the
> filesystem up in safe mode (nice feature!) on the new namenode, and do
> some kind of fscheck. And then, when you're comfortable that
> everything is copacetic, all your files are present, and that the
> filesystem wont do a radical dereplication of every block when you
> make it writable, you make it writable. (In fact, the secondary
> namenode might always come up in safe mode until manually changed).
> How long does this take? Well, during this time the system is
> unavailable. And if it fails at 2AM, you're probably not back up
> before 10AM.
> But thats OK. Better to be down for a few hours (manual failover) than
> to have a complex system likely to break (seamless automatic
> failover).
>> >> > **
>> >> > *Backup Scheme*
>> >> > **
>> >> > We might want to start discussion of a backup scheme for HDFS,
>> >> > especially
>> >> > given all the courageous rewriting and feature-addition likely to
>> >> > occur.
>> >>...
>> >
>> > But as for covering my fears, I'll feel safer with key data backed up
>> > in a filesystem that is not DFS, as pedestrian as that sounds. :)
>> Frankly speaking I've never thought about a backup of a 10 PB storage
>> system. How much space will that require? Isn't it easier just to 
>> increase
>> the replication factor? Just a thought...
> Increasing replication doesnt protect me against a filesystem bug.
> I'm a nervous nelly on this one: file system revisions do scare me,
> and I dont have a 10PB system. Lets say I have a 100TB system, and
> that to get back into production I need only restore 5TB worth of
> critical files. Then once I'm back in production I can gradually
> restore the next 25TB and regenerate the rest.
> Its feasible and probably prudent. Its not that Im expecting data loss
> bugs in  new code. My concern is less about the likelihood of the
> problem, and more about the severity of the problem.
> To back up a 10PB system, you would want to back it up to a second
> 10PB system located on an opposite coast. In fact if this system is
> important to your business, you must do this. And then there is the
> question, do you stagger software updates on these two systems?
> Probably.
> You might want to find someone from EMC or Netapp, and get their
> feedback on how software changes, QA, and beta testing is handled
> (including timelines). Storage systems are really a risky type of code
> to modify, for lots of reasons more apparent to the downstream
> consumers than to developers. :)

View raw message