airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Mutli-layered official image for Airflow
Date Thu, 17 Jan 2019 11:12:39 GMT
I've updated the calculations after removing some artifacts and rebulding
the images from scratch. Here are the updated conclusions:


   - The multi-layered image is only slightly bigger than the mono-layered
   one (around *2% more *in total ) - download time is also slightly longer
   by 1 s  (33.7 vs 32.7s) which is *3% longer.*
   - Downloading the image regularly by the users is way better in case of
   multi-layered image - for simulated user, downloading airflow image twice a
   week it is:  *4950 MB*  (multi-layered) vs. *13546 MB* (mono-layered)
   downloads over the course of 8 weeks. Yielding *64% less data* to
   download.
   - Multi-layered image seems to be much better for users regularly
   downloading the image.


On Wed, Jan 16, 2019 at 10:59 PM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> Hello Everyone,
>
> Following the discussion we had on Mono-layered vs. Multi-layered official
> image for Airflow here https://github.com/apache/airflow/pull/4483, I
> prepared a proof-of-concept PR of multi-layered image (based on the
> mono-layered one) and I performed calculations and reached some conclusions
> in this proposal (I wanted to have some hard numbers to back the statement
> that multi-layered Docker file is better) :
>
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-10+Multi-layered+official+Airflow+image
>
> The conclusions I reached:
>
>    - The multi-layered image is even slightly smaller than the
>    mono-layered one - so multi-layered image is even better when you download
>    it once
>    - Downloading the image regularly by the users is way better in case
>    of multi-layered image - for simulated user, downloading airflow image
>    twice a week it is:  5.7 GB  (multi-layered) vs. 16.15 GB (mono-layered)
>    downloads over the course of 8 weeks.\
>    - Multi-layered image is better choice.
>
>
> I based those calculations on the PR I prepared:
> https://github.com/apache/airflow/pull/4543 where I implemented rather
> nice multi-layered Dockerfile that can be easily maintained.
>
> It's  based on my experience with Airflow Breeze
> <https://github.com/PolideaInternal/airflow-breeze> - the GCP Development
> environment we used to develop 30+ GCP based operators recently.
>
> I hope we can reach the conclusion as the community that multi-layered is
> better and that we can go in this direction :). I am happy to iterate on my
> PR to make it even better.
>
> J.
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> E: jarek.potiuk@polidea.com
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
E: jarek.potiuk@polidea.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message