flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?
Date Mon, 20 Jun 2016 12:23:18 GMT
I've created an issue for this here:
https://issues.apache.org/jira/browse/FLINK-4095

On Mon, Jun 20, 2016 at 11:09 AM, Maximilian Michels <mxm@apache.org> wrote:
> +1 for a CLI parameter for loading the config from a custom location
>
> On Thu, Jun 16, 2016 at 6:01 PM, Till Rohrmann <trohrmann@apache.org> wrote:
>> Hi Arnaud,
>>
>> at the moment the environment variable is the only way to specify a
>> different config directory for the CLIFrontend. But it totally makes sense
>> to introduce a --configDir parameter for the flink shell script. I'll open
>> an issue for this.
>>
>> Cheers,
>> Till
>>
>> On Thu, Jun 16, 2016 at 5:36 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr>
>> wrote:
>>>
>>> Okay, is there a way to specify the flink-conf.yaml to use on the
>>> ./bin/flink command-line? I see no such option. I guess I have to set
>>> FLINK_CONF_DIR before the call ?
>>>
>>> -----Message d'origine-----
>>> De : Maximilian Michels [mailto:mxm@apache.org]
>>> Envoyé : mercredi 15 juin 2016 18:06
>>> À : user@flink.apache.org
>>> Objet : Re: Yarn batch not working with standalone yarn job manager once a
>>> persistent, HA job manager is launched ?
>>>
>>> Hi Arnaud,
>>>
>>> One issue per thread please. That makes things a lot easier for us :)
>>>
>>> Something positive first: We are reworking the resuming of existing Flink
>>> Yarn applications. It'll be much easier to resume a cluster using simply the
>>> Yarn ID or re-discoering the Yarn session using the properties file.
>>>
>>> The dynamic properties are a shortcut to modifying the Flink configuration
>>> of the cluster _only_ upon startup. Afterwards, they are already set at the
>>> containers. We might change this for the 1.1.0 release. It should work if
>>> you put "yarn.properties-file.location:
>>> /custom/location" in your flink-conf.yaml before you execute
>>> "./bin/flink".
>>>
>>> Cheers,
>>> Max
>>>
>>> On Wed, Jun 15, 2016 at 3:14 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr>
>>> wrote:
>>> > Ooopsss....
>>> > My mistake, snapshot/restore do works in a local env, I've had a weird
>>> > configuration issue!
>>> >
>>> > But I still have the property  file path issue  :)
>>> >
>>> > -----Message d'origine-----
>>> > De : LINZ, Arnaud
>>> > Envoyé : mercredi 15 juin 2016 14:35
>>> > À : 'user@flink.apache.org' <user@flink.apache.org> Objet : RE: Yarn
>>> > batch not working with standalone yarn job manager once a persistent, HA
>>> > job manager is launched ?
>>> >
>>> > Hi,
>>> >
>>> > I haven't had the time to investigate the bad configuration file path
>>> > issue yet (if you have any idea why yarn.properties-file.location is ignored
>>> > you are welcome) , but I'm facing another HA-problem.
>>> >
>>> > I'm trying to make my custom streaming sources HA compliant by
>>> > implementing snapshotState() & restoreState().  I would like to test
that
>>> > mechanism in my junit tests, because it can be complex, but I was unable
to
>>> > simulate a "recover" on a local flink environment: snapshotState() is never
>>> > triggered and launching an exception inside the execution chain does not
>>> > lead to recovery but ends the execution, despite the
>>> > streamExecEnv.enableCheckpointing(timeout) call.
>>> >
>>> > Is there a way to locally test this mechanism (other than poorly
>>> > simulating it by explicitly calling snapshot & restore in a overridden
>>> > source) ?
>>> >
>>> > Thanks,
>>> > Arnaud
>>> >
>>> > -----Message d'origine-----
>>> > De : LINZ, Arnaud
>>> > Envoyé : lundi 6 juin 2016 17:53
>>> > À : user@flink.apache.org
>>> > Objet : RE: Yarn batch not working with standalone yarn job manager once
>>> > a persistent, HA job manager is launched ?
>>> >
>>> > I've deleted the '/tmp/.yarn-properties-user' file created for the
>>> > persistent containter, and the batches do go into their own right container.
>>> > However, that's not a workable workaround as I'm no longer able to submit
>>> > streaming apps in the persistant container that way :) So it's really a
>>> > problem of flink finding the right property file.
>>> >
>>> > I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the
>>> > batch command line (also configured in the JVM_ARGS var), with no change
of
>>> > behaviour. Note that I do have a standalone yarn container created, but
the
>>> > job is submitted in the other other one.
>>> >
>>> >  Thanks,
>>> > Arnaud
>>> >
>>> > -----Message d'origine-----
>>> > De : Ufuk Celebi [mailto:uce@apache.org] Envoyé : lundi 6 juin 2016
>>> > 16:01 À : user@flink.apache.org Objet : Re: Yarn batch not working with
>>> > standalone yarn job manager once a persistent, HA job manager is launched
?
>>> >
>>> > Thanks for clarification. I think it might be related to the YARN
>>> > properties file, which is still being used for the batch jobs. Can you try
>>> > to delete it between submissions as a temporary workaround to check whether
>>> > it's related?
>>> >
>>> > – Ufuk
>>> >
>>> > On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr>
>>> > wrote:
>>> >> Hi,
>>> >>
>>> >> The zookeeper path is only for my persistent container, and I do use
a
>>> >> different one for all my persistent containers.
>>> >>
>>> >> The -Drecovery.mode=standalone was passed inside the    JVM_ARGS
>>> >> ("${JVM_ARGS} -Drecovery.mode=standalone
>>> >> -Dyarn.properties-file.location=/tmp/flink/batch")
>>> >>
>>> >> I've tried using -yD recovery.mode=standalone on the flink command line
>>> >> too, but it does not solve the pb; it stills use the pre-existing container.
>>> >>
>>> >> Complete line =
>>> >> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu
>>> >> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s
>>> >> -yD recovery.mode=standalone --class
>>> >> com.bouygtel.kubera.main.segstage.MainGeoSegStage
>>> >> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHO
>>> >> T -allinone.jar  -j /usr/users/datcrypt/alinz/KBR/GOS/log -c
>>> >> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg
>>> >>
>>> >> JVM_ARGS =
>>> >> -Drecovery.mode=standalone
>>> >> -Dyarn.properties-file.location=/tmp/flink/batch
>>> >>
>>> >>
>>> >> Arnaud
>>> >>
>>> >>
>>> >> -----Message d'origine-----
>>> >> De : Ufuk Celebi [mailto:uce@apache.org] Envoyé : lundi 6 juin 2016
>>> >> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working
>>> >> with standalone yarn job manager once a persistent, HA job manager is
>>> >> launched ?
>>> >>
>>> >> Hey Arnaud,
>>> >>
>>> >> The cause of this is probably that both jobs use the same ZooKeeper
>>> >> root path, in which case all task managers connect to the same leading
job
>>> >> manager.
>>> >>
>>> >> I think you forgot to the add the y in the -Drecovery.mode=standalone
>>> >> for the batch jobs, e.g.
>>> >>
>>> >> -yDrecovery.mode=standalone
>>> >>
>>> >> Can you try this?
>>> >>
>>> >> – Ufuk
>>> >>
>>> >> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr>
>>> >> wrote:
>>> >>> Hi,
>>> >>>
>>> >>>
>>> >>>
>>> >>> I use Flink 1.0.0. I have a persistent yarn container set (a
>>> >>> persistent flink job manager) that I use for streaming jobs ; and
I
>>> >>> use the “yarn-cluster” mode to launch my batches.
>>> >>>
>>> >>>
>>> >>>
>>> >>> I’ve just switched “HA” mode on for my streaming persistent
job
>>> >>> manager and it seems to works ; however my batches are not working
>>> >>> any longer because they now execute themselves inside the persistent
>>> >>> container (and fail because it lacks slots) and not in a separate
>>> >>> standalone job manager.
>>> >>>
>>> >>>
>>> >>>
>>> >>> My batch launch options:
>>> >>>
>>> >>>
>>> >>>
>>> >>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm
>>> >>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD
>>> >>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD
>>> >>> akka.ask.timeout=300s"
>>> >>>
>>> >>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone
>>> >>> -Dyarn.properties-file.location=/tmp/flink/batch"
>>> >>>
>>> >>>
>>> >>>
>>> >>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA
>>> >>> $JAR_SUPP $listArgs $ACTION
>>> >>>
>>> >>>
>>> >>>
>>> >>> My persistent cluster launch option :
>>> >>>
>>> >>>
>>> >>>
>>> >>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10
>>> >>> -Drecovery.mode=zookeeper
>>> >>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS}
>>> >>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH}
>>> >>> -Dstate.backend=filesystem
>>> >>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PA
>>> >>> T
>>> >>> H
>>> >>> }/checkpoints
>>> >>>
>>> >>> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/"
>>> >>>
>>> >>>
>>> >>>
>>> >>> $FLINK_DIR/yarn-session.sh
>>> >>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO
>>> >>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS
>>> >>> -tm $FLINK_MEMORY -qu $FLINK_QUEUE  -nm
>>> >>> ${GANESH_TYPE_PF}_KuberaFlink
>>> >>>
>>> >>>
>>> >>>
>>> >>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching
the
>>> >>> container for now, but I lack HA.
>>> >>>
>>> >>>
>>> >>>
>>> >>> Is it a (un)known bug or am I missing a magic option?
>>> >>>
>>> >>>
>>> >>>
>>> >>> Best regards,
>>> >>>
>>> >>> Arnaud
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> ________________________________
>>> >>>
>>> >>> L'intégrité de ce message n'étant pas assurée sur internet,
la
>>> >>> société expéditrice ne peut être tenue responsable de son contenu
ni
>>> >>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée
>>> >>> est interdite. Si vous n'êtes pas destinataire de ce message, merci
>>> >>> de le détruire et d'avertir l'expéditeur.
>>> >>>
>>> >>> The integrity of this message cannot be guaranteed on the Internet.
>>> >>> The company that sent this message cannot therefore be held liable
>>> >>> for its content nor attachments. Any unauthorized use or
>>> >>> dissemination is prohibited. If you are not the intended recipient
>>> >>> of this message, then please delete it and notify the sender.
>>
>>

Mime
View raw message