hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei liu <liulei...@gmail.com>
Subject Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent
Date Mon, 05 Nov 2012 01:59:15 GMT
Hi Steve,

Thank you for your detailed and patiently  answered.  I understand that.


2012/11/5 Steve Loughran <stevel@hortonworks.com>

>
>
> On 4 November 2012 17:25, lei liu <liulei412@gmail.com> wrote:
>
>> I want to know what applications are idempotent or not idempotent? and
>> Why? Could you give me a example.
>>
>>
>
>
> When you say "idempotent", I presume you mean the operation happens
> "at-most-once"; ignoring the degenerate case where all requests are
> rejected.
>
> you can take operations that fail if their conditions aren't met (delete
> path named="something") being the simplest. the operation can send an error
> back "file not found', but the client library can then downgrade that to an
> idempotent assertion: "when the acknowledgment was send from the namenode,
> there was nothing at the end of this path". Which will hold on a replay,
> though if someone creates a file in between, that replay could be
> observable.
>
>
> Now what about move(src,dest)?
>
> if it succeeds, then there is no src path, as it is now at "dest".
>
> What happens if you call it a second time? There is no src, only dest. You
> can't report that back as a success as it is clearly a failure: no src, no
> dest. It's hard to convert that into an assertion on the observable state
> of the system as the state doesn't reflect the history, so you need some
> temporal logic in there too:: at time t0 there existed a directory src, at
> time t1 the directory src no longer existed and its contents were now found
> under directory "dest".
>
> And again, what happens if worse someone else did something in between,
> created a src directory (which it could do, given that the first one has
> been renamed dest), the operation replays and the move takes place twice
> -you've just crossed into at-least-once operations, which is not what you
> wanted.
>
>
> At this point I'm sure you are thinking of having some kind of transaction
> journal, recording that at time Tn, transaction Xn moved the dir. Which
> means you have to start to collect a transaction log of what happened. Now
> effectively HDFS is a journalled file system, it does record a lot of
> things. It just doesn't record user transactions with it, or rescan the log
> whenever any operation comes in, so as to decided what to ignore.
>
> Or you just skip the filesystem changes and have some data structure
> recording "recent" transaction IDs; ignore repeated requests with the same
> IDs. Better, though you'd need to make that failure resistant -it's state
> must propagate to the journal and any failover namenodes so that a
> transaction replay will be idempotent even if the filesystem fails over
> between the original and replayed transaction. And of course all of this
> needs to be atomic with the filesystem state changes...
>
> Summary: It gets complicated fast. Throwing errors back to the caller
> makes life a lot simpler and lets the caller choose its own outcome -even
> though that's not always satisfactory.
>
> Alternatively: it's not that people don't want globally distributed
> transactions -it's just hard.
>
>
>
>
>>
>>
>
>> 2012/10/29 Ted Dunning <tdunning@maprtech.com>
>>
>>> Create cannot be idempotent because of the problem of watches and
>>> sequential files.
>>>
>>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>>> particular applications, you might find it is OK to treat them as such, but
>>> there are definitely applications where they are not idempotent.
>>>
>>>
>>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <liulei412@gmail.com> wrote:
>>>
>>>> I think these methods should are idempotent, these methods should be repeated
>>>> calls to be harmless by same client.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> LiuLei
>>>>
>>>
>>>
>>
>

Mime
View raw message