flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Best way to trigger dataset sampling
Date Tue, 27 Sep 2016 12:35:44 GMT
Hi Flavio,

This is not really possible at the moment. Though there is a workaround.
You can create a dummy jar file (may be empty). Then you can use

./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass

That way Flink will include your cluster jar and you can load all classes

Alternatively, using the Remote Environment, this looks like this:

public static void main(String[] args) throws Exception {

   final RemoteEnvironment env = new RemoteEnvironment(
      new Configuration(),
      new String[0],
      new URL[]{
         new URL("file:///path/to/sample.jar"),
         new URL("file:///Users/max/Dev/flink/build-target/lib/flink-dist_2.10-1.2-SNAPSHOT.jar")});
   URLClassLoader classLoader = new
URLClassLoader(env.globalClasspaths.toArray(new URL[0]));

   Class<?> clazz = classLoader.loadClass("org.package.sample.SampleClass");

   Method main = clazz.getDeclaredMethod("sampleMethod",

   // pass environment as an argument to your sample method
   // the method should return the results of the execution
   Object sampleResult = main.invoke(null, env);

Beware, this is extremely hacky. We should have a better way to invoke jar
files remotely. Honestly, the best thing is if you keep a local copy of
your sampling jars and work directly with them.


On Tue, Sep 27, 2016 at 12:25 PM, Flavio Pompermaier <pompermaier@okkam.it>

> Hi Max,
> actually I have a jar containing sampling jobs and I need to collect
> results from a client.
> I've tried to use ExecutionEnvironment.createRemoteEnvironment but I fear
> that it's not the right way to do that because
> I just need to tell the cluster the main class and the parameters to run
> the job (and where the jar file is on HDFS).
> Best,
> Flavio
> On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <mxm@apache.org>
> wrote:
>> Hi Flavio,
>> Do you want to sample from a running batch job? That would be like
>> Queryable State in streaming jobs but it is not supported in batch
>> mode.
>> Cheers,
>> Max
>> On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier
>> <pompermaier@okkam.it> wrote:
>> > Hi to all,
>> >
>> > I have a use case where I need to tell a Flink cluster to give me a
>> sample
>> > of X records using parametrizable sampling functions. Is there any best
>> > practice or advice to do that?
>> >
>> > Should I create a Remote ExecutionEnvironment or should I use the Flink
>> > client (I don't know if it uses REST services or RPC or whatever)?
>> > Is there any java snippet for that?
>> >
>> > Best,
>> > Flavio
>> >

View raw message