hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Hadoop Distributed File System requirements on Wiki
Date Sun, 09 Jul 2006 06:34:03 GMT
I think we need to have task(s) on the list detailing upgrades.
Also process around releasing filesystem changes that change durable  
And testing...

On Jul 7, 2006, at 12:32 PM, Konstantin Shvachko wrote:

> Paul Sutter wrote:
>> On 7/7/06, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
>>> > *Recoverability and Availability Goals*
>>> >
>>> > You might want to consider adding recoverability and  
>>> availability goals.
>>> This is an interesting observation. Ideally, we would like to  
>>> save and
>>> replicate
>>> fs image file as soon as the edits file reaches a specific size,  
>>> and we
>>> would like
>>> to make edits file updates transactional, with the file system  
>>> locked
>>> for updates
>>> during the transaction. This would be the zero recoverability  
>>> goal in
>>> your terms.
>>> Are we willing to weaken this requirement in favor of the  
>>> performance?
>> Actually its OK for me if we lose even an hour of data on a namenode
>> crash, since I can just resubmit the recent jobs. Less loss is  
>> better,
>> but my suggestion would be to favor simplicity over absolute recovery
>> if thats a tradeoff. Others might feel differently about acceptable
>> levels of data loss.
> I agree, simplicity is also very important.
>>> > Availability goals are probably less stringent than for most  
>>> storage
>>> > systems
>>> > (dare I say that a few hours downtime is probably OK) Adding these
>>> > goals to
>>> > the document could be valuable for consensus and prioritization.
>>> If I understood you correctly, this goal is more related to a  
>>> specific
>>> installation of
>>> the system rather than to the system itself as a software product.
>>> Or do you mean that the total time spent by the system on self- 
>>> maintenance
>>> procedures like backups and checkpointing should not exceed 2  
>>> hours a day?
>>> In any case, I agree, high availability should be mentioned,  
>>> probably in the
>>> "Feature requirements" section.
>> Its about features. Is namenode failover automatic or manual? If its
>> manual, it takes time. And it should definitely be manual for now.
>> Seamless namenode failover done right is a lot of work, and
>> unnecessary.
>> With manual failover, what is the downtime when a namenode fails?
>> Well, I imagine that you'd want to take everything down, bring the
>> filesystem up in safe mode (nice feature!) on the new namenode,  
>> and do
>> some kind of fscheck. And then, when you're comfortable that
>> everything is copacetic, all your files are present, and that the
>> filesystem wont do a radical dereplication of every block when you
>> make it writable, you make it writable. (In fact, the secondary
>> namenode might always come up in safe mode until manually changed).
>> How long does this take? Well, during this time the system is
>> unavailable. And if it fails at 2AM, you're probably not back up
>> before 10AM.
>> But thats OK. Better to be down for a few hours (manual failover)  
>> than
>> to have a complex system likely to break (seamless automatic
>> failover).
> That's a good point. We should probably add a task to define/describe
> manual failover procedures and to evaluate the availability goal  
> that we
> can reasonably guarantee.
>>> >> > *Backup Scheme*
>>> >> > **
>>> >> > We might want to start discussion of a backup scheme for HDFS,
>>> >> > especially
>>> >> > given all the courageous rewriting and feature-addition  
>>> likely to
>>> >> > occur.
>>> >>...
>>> >
>>> > But as for covering my fears, I'll feel safer with key data  
>>> backed up
>>> > in a filesystem that is not DFS, as pedestrian as that sounds. :)
>>> Frankly speaking I've never thought about a backup of a 10 PB  
>>> storage
>>> system. How much space will that require? Isn't it easier just to  
>>> increase
>>> the replication factor? Just a thought...
>> >> > **
>> Increasing replication doesnt protect me against a filesystem bug.
>> I'm a nervous nelly on this one: file system revisions do scare me,
>> and I dont have a 10PB system. Lets say I have a 100TB system, and
>> that to get back into production I need only restore 5TB worth of
>> critical files. Then once I'm back in production I can gradually
>> restore the next 25TB and regenerate the rest.
>> Its feasible and probably prudent. Its not that Im expecting data  
>> loss
>> bugs in  new code. My concern is less about the likelihood of the
>> problem, and more about the severity of the problem.
>> To back up a 10PB system, you would want to back it up to a second
>> 10PB system located on an opposite coast. In fact if this system is
>> important to your business, you must do this. And then there is the
>> question, do you stagger software updates on these two systems?
>> Probably.
>> You might want to find someone from EMC or Netapp, and get their
>> feedback on how software changes, QA, and beta testing is handled
>> (including timelines). Storage systems are really a risky type of  
>> code
>> to modify, for lots of reasons more apparent to the downstream
>> consumers than to developers. :)
> I guess if we want to separate the backup from the original storage
> on the hardware level we have two options
> a) mirror data to another dfs cluster (earlier version, opposite cost)
> b) copy critical data to a different (local) fs
> If only 5% of the whole data set is critical you might want to go  
> with (b).
> This can be a separate (dfs based) application or an extension to dfs.
> If ~100% is critical then (a) is the only way.
> On a related issue, do we want to add the upgrade procedures task  
> to the list?
> Thanks,
> Konstantin

View raw message