gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Tiwari <a...@apache.org>
Subject Re: GAAS feedback.
Date Mon, 02 Apr 2018 23:45:53 GMT
Hi Vicky,

I had a follow-up with Sudarshan (cc'd), he will document his multi-hop
design and thoughts on the open source wiki after a bit of clean up.

Regards
Abhishek

On Mon, Mar 26, 2018 at 11:47 PM, Vicky Kak <vicky.kak@gmail.com> wrote:

> Hi Abhishek,
>
> I did not get a change to followup on this and hence a delayed response,
> sorry about it. In our last meet there was a explanation about the Multi
> Hop, would it be possible to describe it here.
>
> Thanks,
> Vicky
>
> On Tue, Jan 23, 2018 at 5:14 AM, Abhishek Tiwari <abti@apache.org> wrote:
>
>> Hi Vicky,
>>
>> Response inline but great suggestions and questions. Agree with each one,
>> please feel free to create Jiras for each.
>>
>> Also, apologies for the late reply.
>>
>> Regards,
>> Abhishek
>>
>> On Tue, Jan 9, 2018 at 3:24 AM, Vicky Kak <vicky.kak@gmail.com> wrote:
>>
>> > Hi Guys,
>> >
>> > I have finally managed to install the GAAS with Standalone Cluster.
>> >
>> > Here are some of the observations to share
>> >
>> > 1) I have running the GAAS and Standalone cluster on the same machine
>> and
>> > from the same distribution, this will be typically needed for quick
>> setup.
>> > Since I have been starting the GAAS and Standalone master on same
>> > distribution,
>> > they both are directing the logs to the same master.out file leading to
>> > overlap of the logging details from the GAAS and standalone master. I
>> have
>> > changed the logging file from master.out to clustermaster.out on my
>> local
>> > set up by changing the $GOBBLIN_HOME/bin/gobblin-cluster-master.sh as
>> >
>> >
>> >    nohup $COMMAND >clustermaster.out 2>&1 & echo $! > $PID
>> >
>> >    We better make the changes in the distribution.
>> >
>> > I generally run two distributions at different locations to keep
>> workspace
>> / installation clean for each. But I see the advantage of using one
>> (attaching debugger with single IDE instance, etc), so it would be a good
>> idea to segregate the logging for both. We should create a Jira for this.
>>
>> >
>> >
>> > 2) The log4j logging configuration is dynamically controlled in the
>> > standalone/worker implementation, it does not work by default.I looked
>> at
>> > how the log4j configurations are being controlled in other modes, it is
>> > done via the bootstrap scripts e.g gobblin-aws.sh as
>> >   LOG4J_PATH=file://${FWDIR_CONF}/log4j-aws.properties
>> >   COMMAND="$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_FLAGS
>> gobblin.aws.GobblinAWSClusterLauncher
>> > -D log4j.configuration=$LOG4J_PATH"
>> >
>> > I see the log4j configurations similarly being configured in
>> > gobblin-standalone.sh too
>> >   COMMAND+="-Dlog4j.configuration=file://$FWDIR_CONF/log4j-
>> standalone.xml
>> > "
>> >
>> > I did made the similar changes for the gobblin-service.sh as
>> >
>> > LOG4J_PATH=file://${FWDIR_CONF}/log4j-cluster.properties
>> >   COMMAND="$JAVA_HOME/bin/java -Dlog4j.debug
>> -Dlog4j.configuration=$LOG4J_PATH
>> > -cp $CLASSPATH $JVM_FLAGS gobblin.service.modules.core.G
>> obblinServiceManager
>> > --service_name $SERVICE_NAME $LOG_ARGS"
>> >
>> > This was done because the log4j configuration for the GAAS which should
>> > have been taken from $GOBBLIN_HOME/conf/service/log
>> 4j-cluster.properties
>> > was not being taken from there, it was taken from the
>> > $GOBBLIN_HOME/lib/generator-2.6.0.jar.
>> >
>> > We should keep the consistent model of loading the log4j, for the
>> > standalone cluster the log4j configurations are being loaded via code
>> and
>> > for the other gobblin components(modes) it is via the configuration in
>> the
>> > bootstrap scripts. We should have it consistent and I think having it in
>> > the bootstrap scripts via -Dlog4j.configuration is good option.
>> >
>> >  I have to copy the log4j-cluster.properties into the GOBBLIN_HOME/bin
>> for
>> > running the Standalone cluster master/worker node.
>> > We need to fix these log4j configrations issues.
>> >
>> > Thanks, yes this should be made consistent.
>>
>> >
>> > 3) The Gobblin service should have rest port configurable via properties
>> > file, currently we get it from the property in the master.out log file.
>> I
>> > have to check how to get it using the d2 client as per the restli
>> > framework.
>> >
>> Yes, this is pending. Internally, we run within a wrapper jetty container
>> that has fixed port. So, this has slipped priority for anyone to address
>> so
>> far. Good reminder.
>>
>>
>> > 4) We need to have SQL based TopologyStore, i.e Implement pluggable
>> MySql
>> > based TopologyStore.
>> >
>> +1
>>
>> >
>> > 5) Capabilities are hardcoded into the configurations files. It would be
>> > good to have the capabilities configured in the corresponding job pull
>> file
>> > and it should propagate to the GASS when required.
>> >
>> Yes, thats where we intend to move towards. We started with static
>> configuration as v0, but should add a zk based registration or other
>> dynamic ways to announce and discover capabilities. I believe Sudarshan is
>> looking into multi-hop with a bit broader vision and might touch upon this
>> too.
>>
>> >
>> > 6) The Standalone master is not starting without configuring this
>> property
>> >
>> > gobblin.cluster.jobconf.fullyQualifiedPath
>> >
>> > Here is the exception that I see when it is not configured
>> >
>> > 2018-01-09 13:25:11 IST DEBUG [main] org.apache.hadoop.security.Use
>> rGroupInformation
>> > - UGI loginUser:vicky (auth:SIMPLE)
>> >
>> > Exception in thread "main" java.lang.NullPointerException: at index 2
>> >
>> > at com.google.common.collect.ObjectArrays.checkElementNotNull(O
>> bjectArrays.java:240)
>> >
>> >
>> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> ObjectArrays.java:231)
>> >
>> >
>> > at com.google.common.collect.ObjectArrays.checkElementsNotNull(
>> ObjectArrays.java:226)
>> >
>> >
>> > at com.google.common.collect.ImmutableList.construct(ImmutableL
>> ist.java:303)
>> >
>> >
>> > at com.google.common.collect.ImmutableList.of(ImmutableList.java:107)
>> >
>> > at gobblin.cluster.GobblinClusterManager.create(
>> > GobblinClusterManager.java:408)
>> >
>> > at gobblin.cluster.GobblinClusterManager.buildJobConfigurationManager(
>> > GobblinClusterManager.java:400)
>> >
>> > at gobblin.cluster.GobblinClusterManager.initializeAppLauncherAndServic
>> > es(GobblinClusterManager.java:198)
>> >
>> > at gobblin.cluster.GobblinClusterManager.<init>(
>> > GobblinClusterManager.java:164)
>> >
>> > at gobblin.cluster.GobblinClusterManager.main(GobblinClusterMan
>> ager.java:
>> > 743)
>> >
>> >
>> > Since the configuration looks for the job data from kafka queue, this
>> > following configurations need not to be done.
>> >
>> > gobblin.cluster.jobconf.fullyQualifiedPath=/home/
>> > vicky/development/gobblin/gobblin-dist-0.10.0/cluster-job-config-bpu1
>> >
>> > I am going to look into this again, not sure if I am missing anything.
>> >
>> Seems redundant, and like a bug.
>>
>> >
>> > Thanks,
>> > Vicky
>> >
>> >
>> >
>>
>
>

Mime
View raw message