flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Expressing `grep` with many search terms in Flink
Date Wed, 04 Feb 2015 20:34:18 GMT
Hi Stefan,

Flink uses only one broadcast variable for all parallel tasks on one
machine.
Flink can also load the broadcast variable into a custom data structure.

Have a look at the getBroadcastVariableWithInitializer() method:

/**
 * Returns the result bound to the broadcast variable identified by the
 * given {@code name}. The broadcast variable is returned as a shared data
structure
 * that is initialized with the given {@link BroadcastVariableInitializer}.
 * <p>
 * IMPORTANT: The broadcast variable data structure is shared between the
parallel
 *            tasks on one machine. Any access that modifies its internal
state needs to
 *            be manually synchronized by the caller.
 *
 * @param name The name under which the broadcast variable is registered;
 * @param initializer The initializer that creates the shared data
structure of the broadcast
 *                    variable from the sequence of elements.
 * @return The broadcast variable, materialized as a list of elements.
 */
<T, C> C getBroadcastVariableWithInitializer(String name,
BroadcastVariableInitializer<T, C> initializer);

Right now, there is no easy way to run multiple tasks one after the other
that I am aware of.
However, we are working on materializing intermediate results. Once this
feature is available, it should be easy to do the grep steps one by one.

Cheers, Fabian
‚Äč

Mime
View raw message