flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vino yang <yanghua1...@gmail.com>
Subject Re: Checkpointing not happening in Standalone HA mode
Date Fri, 27 Jul 2018 08:30:33 GMT
Hi Vinay,

Oh!  You use a collection source? That's the problem. Please use a general
source like Kafka or others. Maybe your checkpoint has not be triggered,
your job has stopped.

Thanks, vino.

2018-07-27 16:07 GMT+08:00 Vinay Patil <vinay18.patil@gmail.com>:

> Hi Vino,
>
> Yes I am enabling checkpoint in the code as follows :
>
> StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment("<job_manager_host>,<job_manager_port>,getJobConfiguration(),jarPath");
>
>
> env.enableCheckpointing(1000);
>
> env.setSateBackend(new FsStateBackend("file:///<shared_mount_point_location>"));
>
> env.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000);
>
>
> In getJobConfiguration method I have set HA related properties like
> HA_STORAGE_PATH,HA_ZOOKEEPER_QUORUM,HA_ZOOKEEPER_ROOT,HA_
> MODE,HA_JOB_MANAGER_PORT_RANGE,HA_CLUSTER_ID
>
>
> I can see the error in Job Manager logs where it says Collection Source is
> not being executed at the moment. Aborting checkpoint. In the pipeline I
> have a stream initialized using "fromCollection". I think I will have to
> get rid of this.
>
> What do you suggest
>
> Regards,
> Vinay Patil
>
>
> On Thu, Jul 26, 2018 at 12:04 PM vino yang <yanghua1127@gmail.com> wrote:
>
>> Hi Vinay:
>>
>> Did you call specific config API refer to this documentation[1];
>>
>> Can you share your job program and JM Log? Or the JM log contains the log
>> message like this pattern "Triggering checkpoint {} @ {} for job {}."?
>>
>> [1]: https://ci.apache.org/projects/flink/flink-docs-
>> release-1.5/dev/stream/state/checkpointing.html#enabling-
>> and-configuring-checkpointing
>>
>> Thanks, vino.
>>
>> 2018-07-25 19:43 GMT+08:00 Chesnay Schepler <chesnay@apache.org>:
>>
>>> Can you provide us with the job code?
>>>
>>> I assume that checkpointing runs properly if you submit the same job to
>>> a normal cluster?
>>>
>>>
>>> On 25.07.2018 13:15, Vinay Patil wrote:
>>>
>>> No error in the logs. That is why I am not able to understand why
>>> checkpoints are not getting triggered.
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>>
>>> On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <vinay18.patil@gmail.com>
>>> wrote:
>>>
>>>> Hi Chesnay,
>>>>
>>>> No error in the logs. That is why I am not able to understand why
>>>> checkpoints are getting triggered.
>>>>
>>>> Regards,
>>>> Vinay Patil
>>>>
>>>>
>>>> On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <chesnay@apache.org>
>>>> wrote:
>>>>
>>>>> Please check the job- and taskmanager logs for anything suspicious.
>>>>>
>>>>> On 25.07.2018 12:33, Vinay Patil wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am starting the cluster using bootstrap application where in I am
>>>>> calling Job Manager and Task Manager main class to form the cluster.
The HA
>>>>> cluster is formed correctly and I am able to submit jobs to this cluster
>>>>> using RemoteExecutionEnvironment but when I enable checkpointing in code
I
>>>>> do not see any checkpoints triggered on Flink UI.
>>>>>
>>>>> Am I missing any configurations to be set for the
>>>>> RemoteExecutionEnvironment for checkpointing to work.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vinay Patil
>>>>>
>>>>>
>>>>>
>>>
>>

Mime
View raw message