airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Standish <dpstand...@gmail.com>
Subject Re: AIP-21 (Move operators to Core) - "cross_transfer" packages
Date Fri, 04 Oct 2019 14:54:16 GMT
One case popped up for us recently, where it made sense to make a MsSql
*From*S3Operator .

I think using "source" makes sense in general, but in this case calling
this a S3ToMsSqlOperator and putting it under AWS seems silly, even though
you could say s3 is "source" here.

I think in most of these cases we say "let's use source" because source is
where the actual work is done and destination is just storage.

Does a guideline saying "ignore storage" or "storage is secondary in object
location" make sense?



On Fri, Oct 4, 2019 at 6:42 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> It looks like we have general consensus about putting transfer operators
> into "source provider" package.
> That's great for me as well.
>
> Since I will be updating AIP-21 to reflect the "google" vs. "gcp" case, I
> will also update it to add this decision.
>
> If no-one objects (Lazy Consensus
> <https://community.apache.org/committers/lazyConsensus.html>) till
> Monday7th of October, 3.20 CEST, we will update AIP-21 with information
> that transfer operators should be placed in the "source" provider module.
>
> J.
>
> On Tue, Sep 24, 2019 at 1:34 PM Kamil Breguła <kamil.bregula@polidea.com>
> wrote:
>
> > On Mon, Sep 23, 2019 at 7:42 PM Chris Palmer <chris@crpalmer.com> wrote:
> > >
> > > On Mon, Sep 23, 2019 at 1:22 PM Kamil Breguła <
> kamil.bregula@polidea.com
> > >
> > > wrote:
> > >
> > > > On Mon, Sep 23, 2019 at 7:04 PM Chris Palmer <chris@crpalmer.com>
> > wrote:
> > > > >
> > > > > Is there a reason why we can't use symlinks to have copies of the
> > files
> > > > > show up in both subpackages? So that `gcs_to_s3.py` would be under
> > both
> > > > > `aws/operators/` and `gcp/operators`. I could imagine there may be
> > > > > technical reasons why this is a bad idea, but just thought I would
> > ask.
> > > > >
> > > > Symlinks is not supported by git.
> > > >
> > > >
> > > Why do you say that? This blog post
> > > <https://www.mokacoding.com/blog/symliks-in-git/> details how you can
> > use
> > > them, and the caveats with regards to needing relative links not
> > absolute.
> > > The example repo he links to at the end includes a symlink which worked
> > > fine for me when I cloned it. But maybe not relevant given the below:
> >
> > We still have to check if python packages can have links, but I'm
> > afraid of this mechanism. This is not popular and may cause unexpected
> > consequences.
> >
> >
> > > > > Likewise, someone who spends 99% of their time working in AWS and
> > using
> > > > all
> > > > > the operators in that subpackage, might not think to look in the
> GCP
> > > > > package the first time they need a GCS to S3 operator. I'm
> admittedly
> > > > > terrible at documentation, but if duplicating the files via
> symlinks
> > > > isn't
> > > > > an option, then is there an easy way we could duplicate the
> > documentation
> > > > > for those operators so they are easily findable in both doc
> sections?
> > > > >
> > > >
> > > > Recently, I updated the documentation:
> > > > https://airflow.readthedocs.io/en/latest/integration.html
> > > > We have list of all integration in AWS, Azure, GCP.  If the operator
> > > > concerns two cloud proivders, it repeats in two places. It's good for
> > > > documentation.  DRY rule is only valid for source code.
> > > > I am working on documentation for other operators.
> > > > My work is part of this ticket:
> > > > https://issues.apache.org/jira/browse/AIRFLOW-5431
> > > >
> > > >
> > > This updated documentation looks great, definitely heading in a
> direction
> > > that makes it easier and addresses my concerns. (Although it took me a
> > > while to realize those tables can be scrolled horizontally!).
> > >
> > I'm working on redesign of documentation theme. It's part of AIP-11
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-11+Create+a+Landing+Page+for+Apache+Airflow
> > We are currently at the stage of collecting comments from the first
> > phase - we sent materials to the community, but also conducted tests
> > with real users
> >
> >
> https://lists.apache.org/thread.html/6fa1cdceb97ed17752978a8d4202bf1ff1a86c6b50bbc9d09f694166@%3Cdev.airflow.apache.org%3E
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message