flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LINZ, Arnaud" <AL...@bouyguestelecom.fr>
Subject RE: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?
Date Wed, 15 Jun 2016 13:14:56 GMT
Ooopsss....
My mistake, snapshot/restore do works in a local env, I've had a weird configuration issue!

But I still have the property  file path issue  :)

-----Message d'origine-----
De : LINZ, Arnaud 
Envoyé : mercredi 15 juin 2016 14:35
À : 'user@flink.apache.org' <user@flink.apache.org>
Objet : RE: Yarn batch not working with standalone yarn job manager once a persistent, HA
job manager is launched ?

Hi,

I haven't had the time to investigate the bad configuration file path issue yet (if you have
any idea why yarn.properties-file.location is ignored you are welcome) , but I'm facing another
HA-problem.

I'm trying to make my custom streaming sources HA compliant by implementing snapshotState()
& restoreState().  I would like to test that mechanism in my junit tests, because it can
be complex, but I was unable to simulate a "recover" on a local flink environment: snapshotState()
is never triggered and launching an exception inside the execution chain does not lead to
recovery but ends the execution, despite the streamExecEnv.enableCheckpointing(timeout) call.

Is there a way to locally test this mechanism (other than poorly simulating it by explicitly
calling snapshot & restore in a overridden source) ?

Thanks,
Arnaud

-----Message d'origine-----
De : LINZ, Arnaud
Envoyé : lundi 6 juin 2016 17:53
À : user@flink.apache.org
Objet : RE: Yarn batch not working with standalone yarn job manager once a persistent, HA
job manager is launched ?

I've deleted the '/tmp/.yarn-properties-user' file created for the persistent containter,
and the batches do go into their own right container. However, that's not a workable workaround
as I'm no longer able to submit streaming apps in the persistant container that way :) So
it's really a problem of flink finding the right property file.

I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the batch command line
(also configured in the JVM_ARGS var), with no change of behaviour. Note that I do have a
standalone yarn container created, but the job is submitted in the other other one.

 Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:uce@apache.org] Envoyé : lundi 6 juin 2016 16:01 À : user@flink.apache.org
Objet : Re: Yarn batch not working with standalone yarn job manager once a persistent, HA
job manager is launched ?

Thanks for clarification. I think it might be related to the YARN properties file, which is
still being used for the batch jobs. Can you try to delete it between submissions as a temporary
workaround to check whether it's related?

– Ufuk

On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr> wrote:
> Hi,
>
> The zookeeper path is only for my persistent container, and I do use a different one
for all my persistent containers.
>
> The -Drecovery.mode=standalone was passed inside the    JVM_ARGS ("${JVM_ARGS} -Drecovery.mode=standalone
 -Dyarn.properties-file.location=/tmp/flink/batch")
>
> I've tried using -yD recovery.mode=standalone on the flink command line too, but it does
not solve the pb; it stills use the pre-existing container.
>
> Complete line =
> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu
> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s 
> -yD recovery.mode=standalone --class 
> com.bouygtel.kubera.main.segstage.MainGeoSegStage
> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT
> -allinone.jar  -j /usr/users/datcrypt/alinz/KBR/GOS/log -c 
> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg
>
> JVM_ARGS =
> -Drecovery.mode=standalone
> -Dyarn.properties-file.location=/tmp/flink/batch
>
>
> Arnaud
>
>
> -----Message d'origine-----
> De : Ufuk Celebi [mailto:uce@apache.org] Envoyé : lundi 6 juin 2016
> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working 
> with standalone yarn job manager once a persistent, HA job manager is launched ?
>
> Hey Arnaud,
>
> The cause of this is probably that both jobs use the same ZooKeeper root path, in which
case all task managers connect to the same leading job manager.
>
> I think you forgot to the add the y in the -Drecovery.mode=standalone for the batch jobs,
e.g.
>
> -yDrecovery.mode=standalone
>
> Can you try this?
>
> – Ufuk
>
> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr> wrote:
>> Hi,
>>
>>
>>
>> I use Flink 1.0.0. I have a persistent yarn container set (a 
>> persistent flink job manager) that I use for streaming jobs ; and I 
>> use the “yarn-cluster” mode to launch my batches.
>>
>>
>>
>> I’ve just switched “HA” mode on for my streaming persistent job 
>> manager and it seems to works ; however my batches are not working 
>> any longer because they now execute themselves inside the persistent 
>> container (and fail because it lacks slots) and not in a separate standalone job
manager.
>>
>>
>>
>> My batch launch options:
>>
>>
>>
>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm 
>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD 
>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD akka.ask.timeout=300s"
>>
>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone 
>> -Dyarn.properties-file.location=/tmp/flink/batch"
>>
>>
>>
>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA 
>> $JAR_SUPP $listArgs $ACTION
>>
>>
>>
>> My persistent cluster launch option :
>>
>>
>>
>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10
>> -Drecovery.mode=zookeeper
>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS}
>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH}
>> -Dstate.backend=filesystem
>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PAT
>> H
>> }/checkpoints
>> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/"
>>
>>
>>
>> $FLINK_DIR/yarn-session.sh
>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO
>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS -tm 
>> $FLINK_MEMORY -qu $FLINK_QUEUE  -nm ${GANESH_TYPE_PF}_KuberaFlink
>>
>>
>>
>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the 
>> container for now, but I lack HA.
>>
>>
>>
>> Is it a (un)known bug or am I missing a magic option?
>>
>>
>>
>> Best regards,
>>
>> Arnaud
>>
>>
>>
>>
>> ________________________________
>>
>> L'intégrité de ce message n'étant pas assurée sur internet, la 
>> société expéditrice ne peut être tenue responsable de son contenu ni 
>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée 
>> est interdite. Si vous n'êtes pas destinataire de ce message, merci 
>> de le détruire et d'avertir l'expéditeur.
>>
>> The integrity of this message cannot be guaranteed on the Internet.
>> The company that sent this message cannot therefore be held liable 
>> for its content nor attachments. Any unauthorized use or 
>> dissemination is prohibited. If you are not the intended recipient of 
>> this message, then please delete it and notify the sender.
Mime
View raw message