airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Invitation: Airflow Contributors & Roadmapping Meeting @ Thu Oct 6, 2016 10am - 12pm (PDT) (gurer.kiratli@airbnb.com)
Date Tue, 27 Sep 2016 08:16:14 GMT
Hi Guys,

Im not sure if it will fit the agenda and we don’t have something like break out rooms ;-).
So this is for your consideration if we can add it:

1. MySQL 5.6 compatibility issues: Fractional seconds vs Round off
MySQL does not store fractional seconds which is against the SQL standard (http://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html
<http://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html>). Therefore there is
a difference on how MySQL stores seconds and SQLite and Postgres do. It also means that for
some edge cases we are incompatible with MySQL 5.6. This has been caught by the unit tests
when trying to move to Travis’ ubuntu trusty which relies on MySQL 5.6. It has been proposed
to change MySQL’s schema to store fractional seconds, but a counter proposal was made to
do a round off in code. Both options have their pro’s and con’s, but it is blocking the
upgrade of the CI environment.

2. Perceived scheduler stability & roadmap
Several issues affect the perception of the robustness of the scheduler. Reports have been
coming in of schedulers being stuck that have been impossible to replicate. The reports itself
often lack detail (executor, broker, airflow config, python version, os etc) and quite often
are incorrect due to the difference between the in process executors (Local/Sequential) and
out-of-band executors (Celery/Mesos): long running tasks will affect loop the scheduler in
case of in process executors. However, there are some other issues that need to be addressed.

a) Airflow’s heritage contains a “num_runs” feature that made the scheduler stop after
num_runs loops. Although never a full explanation has been provided, the most likely explanation
is that this was related the the scheduler in certain circumstances not being able to queue
new tasks when using the CeleryExecutor. The scheduler then seems “stuck”. In a recent
refactor of the scheduler code also “run_duration” has been introduced which more or less
seems to address the same issue by stopping the scheduler after a certain amount of time.
This run_duration cannot be disabled at the moment, nevertheless several shops are running
on Celery without num_runs (our scheduler uptime is 76 days at the moment). This begs the
question what is the root cause for having the functionality and was the root issue maybe
fixed upstream? I guess reliability and predictability of the scheduler are important to everyone.

b) In some circusstances the scheduler seems to get stuck (not logging anymore) when tasks
are being scheduled through scheduler childs (1.7.1.3 - multiprocessing). See also Jira-366.
It is impossible to replicate this locally for me with both the LocalExecutor and CeleryExecutor.

c) Event driven scheduler?

3. Logging etiquette 
Logging (Configuration (SysLog, Files) in Airflow is not standardized. So we have arbitrary
files being written and making it difficult to debug certain issues.

4. API 
- Protocols (AVRO/Protobuf, JSON)
- Security (OAUTH, Kerberos); integration with web-ui authentication
- Own process?
- Interprocess API (Tasks asking for connection details from a central API, instead of going
to the DB directly)


Bolke



> Op 17 sep. 2016, om 00:53 heeft gurer.kiratli@airbnb.com.INVALID het volgende geschreven:
> 
> Never miss an appointment.
> Download the Google Calendar app.
>  <https://goo.gl/czFEqm>	 <https://goo.gl/hxLBzR>
> more details » <https://www.google.com/calendar/event?action=VIEW&eid=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc&tok=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ&ctz=America/Los_Angeles&hl=en>
> Airflow Contributors & Roadmapping Meeting
> Hi all,
> Again it has been a while after our last meeting. Let's have another meeting to sync
up!
> 
> We are super happy to host all you folks at Airbnb(888 Brannan St 94103) on October 7th
at 10:00am. Also we will have a webex session at https://airbnb.webex.com/meet/gurer.kiratli
<https://www.google.com/url?q=https%3A%2F%2Fairbnb.webex.com%2Fmeet%2Fgurer.kiratli&sa=D&usd=2&usg=AFQjCNFC3kEwm1Mu8gSE2gl7SlNkV5NMCg>.
 
> 
> I will send this out as a Google Calendar but due to the fact that it goes thru the mail
group I don't see your responses. If you are planning to come on please respond back to me
with your first name, last name. And please try to arrive by 9:30 so we can check you and
head to the meeting room. : ) 
> 
> Here is the proposed agenda:
> 10:00am -10:45am PDT 
> Contributors sync-up: progress and plan
> Release Schedule, Management 
> 10:45am - 11:00am PDT
> Coffee Break
> 11:00am - 12:00pm PDT
> Roadmap discussion
> 12:00pm - 1:00pm PDT
> Lunch @ Airbnb
> Cheers,
> 
> Gurer
> 
> === * * * ===
> https://airbnb.webex.com/meet/gurer.kiratli <https://www.google.com/url?q=https%3A%2F%2Fairbnb.webex.com%2Fmeet%2Fgurer.kiratli&sa=D&usd=2&usg=AFQjCNFC3kEwm1Mu8gSE2gl7SlNkV5NMCg>
> [WebEx: 000000000]
> 
> 
> 
> 
> 
> 
> 
> 
> 
> When
> Thu Oct 6, 2016 10am – 12pm Pacific Time
> Where
> Airbnb HQ, 888 Brannan St, San Francisco, CA 94103, USA (map <https://maps.google.com/maps?q=Airbnb+HQ,+888+Brannan+St,+San+Francisco,+CA+94103,+USA&hl=en>)
> Calendar
> gurer.kiratli@airbnb.com
> Who
> •	
> gurer.kiratli@airbnb.com - organizer
> •	
> dev@airflow.incubator.apache.org
> Going?   Yes <https://www.google.com/calendar/event?action=RESPOND&eid=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc&rst=1&tok=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ&ctz=America/Los_Angeles&hl=en>
- Maybe <https://www.google.com/calendar/event?action=RESPOND&eid=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc&rst=3&tok=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ&ctz=America/Los_Angeles&hl=en>
- No <https://www.google.com/calendar/event?action=RESPOND&eid=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc&rst=2&tok=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ&ctz=America/Los_Angeles&hl=en>
   more options » <https://www.google.com/calendar/event?action=VIEW&eid=b2YxYjB2a3M2cDc0ZDVnMHYwMGZmMmFzN2sgZGV2QGFpcmZsb3cuaW5jdWJhdG9yLmFwYWNoZS5vcmc&tok=MjQjZ3VyZXIua2lyYXRsaUBhaXJibmIuY29tMDQ5ZGNhMzI3OWI5NTJjNThjYTc3YTkzYzgwODc0Yjk0ZmI2NWUwOQ&ctz=America/Los_Angeles&hl=en>
> Invitation from Google Calendar <https://www.google.com/calendar/>
> You are receiving this courtesy email at the account dev@airflow.incubator.apache.org
because you are an attendee of this event.
> 
> To stop receiving future updates for this event, decline this event. Alternatively you
can sign up for a Google account at https://www.google.com/calendar/ and control your notification
settings for your entire calendar.
> 
> Forwarding this invitation could allow any recipient to modify your RSVP response. Learn
More <https://support.google.com/calendar/answer/37135#forwarding>.
> 
> <Mail-bijlage.ics><invite.ics>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message