airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [airflow] lokeshlal edited a comment on issue #6975: Dynamic pooling via allowing tasks to use more than one pool slot (depending upon the need)
Date Wed, 01 Jan 2020 09:39:08 GMT
lokeshlal edited a comment on issue #6975: Dynamic pooling via allowing tasks to use more than
one pool slot (depending upon the need)
URL: https://github.com/apache/airflow/pull/6975#issuecomment-569917507
 
 
   @tooptoop4 Yes the approach looks good when multiple pools are required as described in
the jira ticket.  
   This PR will be useful in a scenario, where we have a spark cluster where jobs needs to
be submitted with different complexity (such as Large jobs, medium jobs etc) and each job
would require different capacity on spark cluster. Hence dynamic pooling can help control
the spark cluster capacity directly from the Airflow using pools. this is aligned to the following
jira ticket https://issues.apache.org/jira/browse/AIRFLOW-1467 
   
   The problem statement mentioned in the jira ticket https://issues.apache.org/jira/browse/AIRFLOW-6227,
can be handled via locking a file for write. That is, if the ask is to keep one writer on
a table, then before triggering spark job, create another task that will put a file write
lock on a file (name same as table name) in the file system (libraries such as fasteners or
lockfile in a python operator can be used). This will make sure that at a time only one job
will be triggered for the said table and makes the code more dynamic rather than creating
pools every time a new table is introduced. and once the spark job finishes (weather the job
fail or pass) then release the lock from the file.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message