airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Van Klaveren, Brian N." <b...@slac.stanford.edu>
Subject Re: Task partitioning using Airflow
Date Wed, 09 Aug 2017 17:19:42 GMT
Have you looked into subdags?

Brian


> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <ashish.rawat@myntra.com> wrote:
> 
> Thanks George. Our use case also periodic scheduling (daily), as well as task dependencies,
so we chose Airflow for this use case. However, some of the tasks in a DAG have now become
too big to execute over one node, we want to split them into multiple task to reduce execution
time. Would you recommend firing parts of an Airflow DAG in another framework?
> 
> --
> Regards,
> Ashish
> 
> 
> 
>> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman <george@cloverhealth.com.INVALID>
wrote:
>> 
>> Airflow is best for situations where you want to run different tasks that
>> depend on each other or process data that arrives over time. If your goal
>> is to take a large dataset, split it up, and process chunks of it, there
>> are probably other tools better suited to your purpose.
>> 
>> Off the top of my head, you might consider Dask:
>> https://dask.pydata.org/en/latest/ or directly using Celery:
>> http://www.celeryproject.org/
>> 
>> --George
>> 
>> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.rawat@myntra.com> wrote:
>> 
>>> Hi - Can anyone please provide some pointers for this use case over
>>> Airflow?
>>> 
>>> --
>>> Regards,
>>> Ashish
>>> 
>>> 
>>> 
>>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.rawat@myntra.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> We have a use case where we are running some R/Python based data science
>>> models, which execute over a single node. The execution time of the models
>>> is constantly increasing and we are now planning to split the model
>>> training by a partition key and distribute the workload over multiple
>>> machines.
>>>> 
>>>> Does Airflow provide some simple way to split a task into multiple
>>> tasks, all of which will work on a specific value of the key.
>>>> 
>>>> --
>>>> Regards,
>>>> Ashish
>>>> 
>>>> 
>>>> 
>>> 
>>> 
> 


Mime
View raw message