gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Tiwari <finda...@gmail.com>
Subject Re: Set JobConf Dir in Gobblin CLI
Date Thu, 25 Jan 2018 11:15:29 GMT
We do that through a mix of Gobblin-as-a-Service (GaaS) and Standalone
cluster, where we invoke REST API of GaaS to submit on-demand one-time
jobs, as well schedule jobs for execution. The scheduler also in this case
runs in the Service, and cluster just listens for jobs from Service. The
Service and Cluster talk through Kafka.

Abhishek

On Thu, Jan 25, 2018 at 2:18 AM, Birger Kamp <birger.kamp@proum.de> wrote:

> Hi Abhishek,
>
> thanks for that clarification!
>
> I’ve got some jobs (10?20? some…) and I want to let them run on demand.
> Therefore I want something which doesn’t run as a daemon and quits if all
> jobs are done.
> I can’t find an option for the Standalone Cluster to do this.
>
> Is there any workaround for this?
>
> Thanks and regards
> Birger
>
>
> On 25. Jan 2018, at 11:13, Abhishek Tiwari <abti@apache.org> wrote:
>
> Hi Birger,
>
> The property ConfigurationKeys.JOB_CONFIG_FILE_GENERAL_PATH_KEY is not
> supported by the Gobblin CLI application because it takes a slightly
> different code path, and like you observed is written to execute only one
> job.
> You can either run that one job by specifying template, or a pull file via
> invoking jobFile() in your App (not both because if a jobFile is around, it
> does not resolves template).
>
> The workaround is another mode which is standalone instance mode that is
> invokable via gobblin-standalone.sh
> https://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-Deployment/#
> standalone-deployment
>
> Note: To avoid you any confusion if you attempt to debug both mode: the
> code path gobblin-standalone.sh and Gobblin CLI take are bit different from
> entry to job-scheduling. The job execution however is same thereafter.
>
> Regards,
> Abhishek
>
> On Thu, Jan 25, 2018 at 1:01 AM, Birger Kamp <birger.kamp@proum.de> wrote:
>
>> Hi friends of Gobblin,
>>
>> I’m trying to build a Gobblin CLI application which runs multiple jobs.
>> For this I copied the WikipediaExampleApp and modified it for myself. Let’s
>> name it “MyApp”.
>>
>> MyApp has just a single important method and that’s the constructor:
>>
>> public IngestionApp(String pathToJobs, String pathToJar) {
>>    super("IngestAllJobs");
>>    this.setConfiguration(ConfigurationKeys.JOB_CONFIG_FILE_GENERAL_PATH_KEY, pathToJobs);
>>    this.distributeJar(pathToJar);
>> }
>>
>> There you can see how I’m configuring the FQDN path to the jobs. This
>> works already if I’m using the Standalone-Cluster, so it should work also
>> for MyApp.
>>
>> If running "bin/gobblin run ingestAllJobs hdfs://localhost:9000/jobs
>> path/to/distribute_dependencies.jar” the QuickApp is starting and stops
>> with a NPE:
>>
>> 2018-01-25 09:48:12 WARN  JobContext - Property task.data.root.dir is
>> missing.
>> 2018-01-25 09:48:12 ERROR EmbeddedGobblin - Job launch failed:
>> java.lang.RuntimeException: JobLauncher creation failed:
>> java.lang.RuntimeException: Failed to create job launcher:
>> java.lang.NullPointerException
>> java.lang.RuntimeException: JobLauncher creation failed:
>> java.lang.RuntimeException: Failed to create job launcher:
>> java.lang.NullPointerException
>> at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
>> er.createLauncher(JobLauncherExecutionDriver.java:177)
>> at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
>> er.create(JobLauncherExecutionDriver.java:121)
>> at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
>> er$Launcher.launchJob(JobLauncherExecutionDriver.java:453)
>> at org.apache.gobblin.runtime.instance.DefaultGobblinInstanceDr
>> iverImpl$JobSpecRunnable.run(DefaultGobblinInstanceDriverImpl.java:209)
>> at org.apache.gobblin.runtime.scheduler.AbstractJobSpecSchedule
>> r$TriggerRunnable.run(AbstractJobSpecScheduler.java:177)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.RuntimeException: Failed to create job launcher:
>> java.lang.NullPointerException
>> at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher
>> (JobLauncherFactory.java:120)
>> at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher
>> (JobLauncherFactory.java:85)
>> at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
>> er.createLauncher(JobLauncherExecutionDriver.java:174)
>> ... 5 more
>> Caused by: java.lang.NullPointerException
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:264)
>> at org.apache.gobblin.runtime.JobContext.createSource(JobContex
>> t.java:242)
>> at org.apache.gobblin.runtime.JobContext.<init>(JobContext.java:172)
>> at org.apache.gobblin.runtime.AbstractJobLauncher.<init>(Abstra
>> ctJobLauncher.java:183)
>> at org.apache.gobblin.runtime.local.LocalJobLauncher.<init>(Loc
>> alJobLauncher.java:77)
>> at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher
>> (JobLauncherFactory.java:106)
>> ... 7 more
>> 2018-01-25 09:48:16 WARN  EmbeddedGobblin - Timeout waiting for job to
>> start. Aborting.
>> 2018-01-25 09:48:16 INFO  EmbeddedGobblin - Shutting down driver …
>>
>> After some investigating I’m not sure if the QuickApp is even build to
>> work this way. Seems like you can only configure a single job into a
>> QuickApp by defining concrete Source, Extractor, Converter and stuff. Or
>> use a template.
>> But it’s not working for configuring a whole job directory. Is that
>> correct? Is there a workaround? Or do I have to touch Gobblin source code
>> for getting this?
>>
>> Thank you for all your help! (In Mails and Gitter!)
>> Birger
>>
>
>
>

Mime
View raw message