hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <shv.had...@gmail.com>
Subject Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly
Date Sun, 13 Dec 2020 21:08:25 GMT
Hi Steve,

I am not sure I fully understand what is broken here. It is not an
incompatible change, right?
Could you please explain what you think the process is.
Would be best if you could share a link to a document describing it.
I would be glad to follow up with tests and documentation that are needed.

As you can see I proposed multiple solutions to the problem in the jira.
Seemed nobody was objecting, so I chose one and explained why.
I believe we call it lazy consensus.

Stay safe,
--Konstantin

On Sun, Dec 13, 2020 at 10:22 AM Chao Sun <sunchao@apache.org> wrote:

> > This is an API where it'd be ok to have a no-op if not implemented,
> correct? Or is there an requirement like Syncable that specific guarantees
> are met´╝č
>
> Yes I think it's ok to leave it as no-op for other non-HDFS FS impls: it is
> only used by HDFS standby reads so far.
>
>
>
> On Sun, Dec 13, 2020 at 4:58 AM Steve Loughran <stevel@cloudera.com>
> wrote:
>
> > This isn't worth holding up the RC. We'd just add something to the
> > release notes "use with caution". And if we can get what the API does
> > defined in a way which works, it shouldn't need changing.
> >
> > (which reminds me, I do need to check that RC out, don't I?)
> >
> > On Sun, 13 Dec 2020 at 09:00, Xiaoqiao He <hexiaoqiao@apache.org> wrote:
> >
> >> Thanks Steve very much for your discussion here.
> >>
> >> Leave some comments inline. Will focus on this thread to wait for the
> >> final
> >> conclusion to decide if we should prepare another release candidate of
> >> 3.2.2.
> >> Thanks Steve and Chao again for your warm discussions.
> >>
> >> On Sat, Dec 12, 2020 at 7:18 PM Steve Loughran
> >> <stevel@cloudera.com.invalid>
> >> wrote:
> >>
> >> > Maybe it's not in the release; it's certainly in the 3.2 branch. Will
> >> check
> >> > further. If it's in the release I was thinking of adding a warning in
> >> the
> >> > notes "unstable API"; stable if invoked from DFSClient
> >>
> >> On Fri, 11 Dec 2020 at 18:21, Chao Sun <sunchao@apache.org> wrote:
> >> >
> >> > > I'm just curious why this is included in the 3.2.2 release?
> >> HDFS-15567 is
> >> > > tagged with 3.2.3 and the corresponding HDFS-14272 on server side
is
> >> > tagged
> >> > > with 3.3.0.
> >> >
> >>
> >> Just checked that HDFS-15567 has been involved in Hadoop-3.2.2 RC4.
> IIRC,
> >> I
> >> have cut branch-3.2.2 in early October, at that time branch-3.2.3 has
> >> created but source code not freeze completely because several blocked
> >> issues reported and code freeze has done about mid October. Some issues
> >> which are tagged with 3.2.3 has also been involved in 3.2.2 during
> >> that period, include HDFS-15567. I will check them later, and make sure
> >> that we have mark the correct tags.
> >>
> >>
> >> > >
> >> > > > If it goes into FS/FC, what does it do for a viewfs with >1
> mounted
> >> > HDFS?
> >> > > Should it take path, msync(path) so that viewFS knows where to
> forward
> >> > it?
> >> > >
> >> > > The API shouldn't take any path - for viewFS I think it should call
> >> this
> >> > on
> >> > > all the child file systems. It might also need to handle the case
> >> where
> >> > > some downstream clusters support this capability while others don't.
> >> > >
> >> >
> >> > That's an extra bit of work for ViewFS then. It should probe for
> >> capability
> >> > and invoke as/when supported.
> >> >
> >> > >
> >> > > > Options
> >> > > 1. I roll HDFS-15567 back "please be follow process"
> >> > > 2. Someone does a followup patch with specification and contract
> test,
> >> > view
> >> > > FS. Add even more to the java
> >> > > 3. We do as per HADOOP-16898 into an MSyncable interface and then
> >> > > FileSystem & HDFS can implement. ViewFS and filterFS still need
to
> >> pass
> >> > > through.
> >> > >
> >> > > I'm slightly in favor of the hasPathCapabilities approach and make
> >> this a
> >> > > mixin where FS impls can optionally support. Happy to hear what
> others
> >> > > think.
> >> > >
> >> >
> >> > Mixins are great when FC and FS can both implement; makes it easier to
> >> code
> >> > against either. All the filtering/aggregating FS's will have to
> >> implement
> >> > it, which means that presence of the interface doesn't guarantee
> >> support.
> >> >
> >> > This is an API where it'd be ok to have a no-op if not implemented,
> >> > correct? Or is there an requirement like Syncable that specific
> >> guarantees
> >> > are met?
> >> >
> >> > >
> >> > > Chao
> >> > >
> >> > >
> >> > > On Fri, Dec 11, 2020 at 9:00 AM Steve Loughran
> >> > <stevel@cloudera.com.invalid
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Silence from the  HDFS team
> >> > > >
> >> > > >
> >> > > > Hadoop 3.2.2 is in an RC; it has the new FS API call. I really
> don't
> >> > want
> >> > > > to veto the release just because someone pulled up a method
> without
> >> > doing
> >> > > > the due diligence.
> >> >
> >>
> >> Thanks Steve started this discussion here. I agree to roll back
> HDFS-15567
> >> if there are still some incompatible issues not resolved completely. And
> >> release will not be the blocked things here, I would like to prepare
> >> another RC if we would reach common agreement. To be honest, I think it
> is
> >> better to involve Shvachko here.
> >>
> >>
> >> > > > Is anyone in the HDFS going to do that due diligence or should
we
> >> > include
> >> > > > something in the release notes "msync()" must be considered
> >> unstable.
> >> > > >
> >> > > > Then we can do a proper msync().
> >> > > >
> >> > > > If it goes into FS/FC, what does it do for a viewfs with >1
> mounted
> >> > HDFS?
> >> > > > Should it take path, msync(path) so that viewFS knows where to
> >> forward
> >> > > it?
> >> > > >
> >> > > > Alternatively: go with an MSync interface which those few FS
which
> >> > > > implement it (hdfs) can do that, and the fact that it doesn't
have
> >> doc
> >> > or
> >> > > > tests won't be a blocker any more?
> >> > > >
> >> > > > -steve
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, 10 Dec 2020 at 12:41, Steve Loughran <stevel@cloudera.com
> >
> >> > > wrote:
> >> > > >
> >> > > > >
> >> > > > > Gosh, has it really been only since february since I last
asked
> >> the
> >> > > HDFS
> >> > > > > dev list to stop adding anything to FileSystem/FileContext
APIs
> >> > without
> >> > > > >
> >> > > > > * mentioning this on the hadoop-common list.
> >> > > > > * specifying what it does in filesystem.md
> >> > > > > * with a contract test
> >> > > > > * a new hasPathCapabilities probe. Throwing
> >> > > UnsupportedOperationException
> >> > > > > only lets people work out if it is unsupported through
> invocation.
> >> > > Being
> >> > > > > able to probe for it is better.
> >> > > > > * ViewFS support.
> >> > > > > * And, for any new API, one which works well for high-latency
> >> object
> >> > > > > stores: returning Future<Something> and
> >> > > Future<RemoteIterator<Something>
> >> > > > > when > 1 result is returned
> >> > > > >
> >> > > > > This needs to hold even for pulling something up from HDFS.
> >> Because
> >> > if
> >> > > > > another FS wants to implement it, they need to know what
it
> does,
> >> and
> >> > > > have
> >> > > > > tests to verify this. I say this as someone who has tried
to
> >> document
> >> > > > HDFS
> >> > > > > rename() semantics and gave up.
> >> > > > >
> >> > > > > It's really frustrating that every time someone does an
FS API
> >> change
> >> > > > like
> >> > > > > this in the past (most recently HDFS-13616) I am the one
who has
> >> to
> >> > > keep
> >> > > > > sending the reminders out, and then having to try and clean
up/.
> >> > > > >
> >> > > > > So what now?
> >> > > > >
> >> > > > > Options
> >> > > > > 1. I roll HDFS-15567 back "please be follow process"
> >> > > > > 2. Someone does a followup patch with specification and
contract
> >> > test,
> >> > > > > view FS. Add even more to the java
> >> > > > > 3. We do as per HADOOP-16898 into an MSyncable interface
and
> then
> >> > > > > FileSystem & HDFS can implement. ViewFS and filterFS
still need
> to
> >> > pass
> >> > > > > through.
> >> > > > >
> >> > > > > *If nobody is going to volunteer for the specification/test
> >> changes,
> >> > > I'm
> >> > > > > happy for the rollback. It'll remind people about process,
*
> >> > > > >
> >> > > > > Pre-emptive Warning: No matter what we do for this patch,
I will
> >> roll
> >> > > > back
> >> > > > > the next change which adds a new API if it's not accompanied
by
> >> > > > > specification and tests.
> >> > > > >
> >> > > > > Unhappily yours,
> >> > > > >
> >> > > > > Steve
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message