beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Falcon Taylor-Carter <fal...@bounceexchange.com>
Subject Re: Advice on parallelizing network calls in DoFn
Date Thu, 15 Mar 2018 15:25:21 GMT
Hello Pablo,

Thanks for checking up (I'm working with Josh on this problem). It seems
there isn't a built-in process for this kind of use case currently, and
that the best process right now is to handle our own bundling and threading
in the DoFn. If you had any other suggestions, or anything to keep in mind
in doing this, let us know!

Falcon

On Tue, Mar 13, 2018 at 4:52 PM, Pablo Estrada <pabloem@google.com> wrote:

> I'd just like to close the loop. Josh, did you get an answer/guidance on
> how to proceed with your pipeline?
> Or maybe we'll need a new thread to figure that out : )
> Best
> -P.
>
>
> On Fri, Mar 9, 2018 at 1:39 PM Josh Ferge <josh.ferge@bounceexchange.com>
> wrote:
>
>> Hello all:
>>
>> Our team has a pipeline that make external network calls. These pipelines
>> are currently super slow, and the hypothesis is that they are slow because
>> we are not threading for our network calls. The github issue below provides
>> some discussion around this:
>>
>> https://github.com/apache/beam/pull/957
>>
>> In beam 1.0, there was IntraBundleParallelization, which helped with
>> this. However, this was removed because it didn't comply with a few BEAM
>> paradigms.
>>
>> Questions going forward:
>>
>> What is advised for jobs that make blocking network calls? It seems
>> bundling the elements into groups of size X prior to passing to the DoFn,
>> and managing the threading within the function might work. thoughts?
>> Are these types of jobs even suitable for beam?
>> Are there any plans to develop features that help with this?
>>
>> Thanks
>>
> --
> Got feedback? go/pabloem-feedback
>

Mime
View raw message