hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1108) Log newly allocated blocks
Date Wed, 24 Aug 2011 07:57:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090068#comment-13090068

Konstantin Shvachko commented on HDFS-1108:

bq. I and several others am working on approach #2

Thanks Todd for clarifying this. Could you please also share the design document for your
approach? I would like to learn the details and understand why you choose an approach which
at this point does not seem to me optimal for the project.

bq. option 1 causes dataloss regardless of your opinions on HA

This is not a data loss, Todd. This is a tradeoff between performance and the persistence
of data. Having flush and sync one can control when to choose performance in favor of guaranteed
persistence and when vice versa. Regardless of my opinion on HA 
Is anybody asking to remove this flexibility?

bq. How do you differentiate between logSync() to that stream Vs stream to the disk?

I do not differentiate between streams. As I said addBlock() transaction should be treated
the same way as setTimes(), that is it is logged (and batched) but not synced. There is no
consistency issue here. Transactions will eventually be committed to the journal by another
sync-able transaction or by a file close().

bq. The approach 2 has different requirements that many are interested in. I have repeated
this many times.

Suresh, yes you do repeat it. But you never answered MY question, which HA approach are you
implementing. As you see you have to make choices even with issues that seemed to be common
part for all approaches.

I like Milind's idea about an implementation "without shared storage assumption".

Sticking to the point.
- Without HA consideration this patch removes flexibility to choose between performance and
guaranteed persistence of data
- There should be a good reason for that
- HA solution with shared storage seems to be the reason
- The community has not seen the design, hasn't discussed it, and is not aware of why this
one is better than other three (or was it four) published in different jiras.

> Log newly allocated blocks
> --------------------------
>                 Key: HDFS-1108
>                 URL: https://issues.apache.org/jira/browse/HDFS-1108
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>         Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt, hdfs-1108.txt
> The current HDFS design says that newly allocated blocks for a file are not persisted
in the NN transaction log when the block is allocated. Instead, a hflush() or a close() on
the file persists the blocks into the transaction log. It would be nice if we can immediately
persist newly allocated blocks (as soon as they are allocated) for specific files.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message