Hi Ramya

Removed the dev mail list in receiver.

Can you check the configuration of your "Job Manager" tab via web UI to see whether state.savepoints.dir [1] existed? If that existed, default savepoint directory is already given and such problem should not happen.



[1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#state-savepoints-dir

Best
Yun Tang

From: Ramya Ramamurthy <hairums@gmail.com>
Sent: Friday, January 31, 2020 15:03
To: dev@flink.apache.org <dev@flink.apache.org>
Cc: user <user@flink.apache.org>
Subject: Re: REST Monitoring Savepoint failed
 
Hi Till,

I am using flink 1.7.
This is my observation.

a) I first trigger a savepoint. this is stored on my Google cloud storage.
b) When i invoke the rescale HTTP API, i get the error telling savepoints
dir is not configured. But post triggering a), i could verify the savepoint
directory present in GCS in the mentioned path.

Below is the snapshot of my deployment file.

Environment:
      JOB_MANAGER_RPC_ADDRESS:                svc-flink-jobmanager-gcs
      HIGH_AVAILABILITY:                      zookeeper
      HIGH_AVAILABILITY_ZOOKEEPER:            zookeeper-dev-1:2181
      HIGH_AVAILABILITY_ZOOKEEPER_PATH_ROOT:  /flink-gcs-chk
      HIGH_AVAILABILITY_CLUSTER_ID:           fs.default_ns
      HIGH_AVAILABILITY_STORAGEDIR:
gs://xxxxx/flink/flink-gcs/checkpoints
      HIGH_AVAILABILITY_JOBMANAGER_PORT:      6123
      STATE_CHECKPOINTS_DIR:
 gs://xxxxx/flink/flink-gcs/flink-checkpoints
      STATE_SAVEPOINTS_DIR:
gs://xxxxx/flink/flink-gcs/flink-savepoints

Response to my Savepoints REST API is as below:
{
    "status": {
        "id": "COMPLETED"
    },
    "operation": {
        "location":
"gs://xxxxx/flink/flink-gcs/flink-savepoints/savepoint-d02217-ec0aced1fe9c"
    }
}

So why does the job doesnt recognize this savepoint directory ? Also,
during this operation, i could see the Checkpoints directory for this job
gets deleted. Post which, no checkpoints are happening. any thoughts here
would really help us in progressing.

Thanks,

On Thu, Jan 30, 2020 at 8:45 PM Till Rohrmann <trohrmann@apache.org> wrote:

> Hi Ramya,
>
> I think this message is better suited for the user ML list. Which version
> of Flink are you using? Have you checked the Flink logs to see whether they
> contain anything suspicious?
>
> Cheers,
> Till
>
> On Thu, Jan 30, 2020 at 1:09 PM Ramya Ramamurthy <hairums@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am trying to dynamically increase the parallelism of the job. In the
> > process of it, while I am trying to trigger the savepoint, i get
> > the following error. Any help would be appreciated.
> >
> > The URL triggered is :
> >
> >
> http://stgflink.ssdev.in:8081/jobs/2865c1f40340bcb19a88a01d6ef8ff4f/savepoints/
> > {
> >     "target-directory" :
> > "gs://xxxx-bucket/flink/flink-gcs/flink-savepoints",
> >     "cancel-job" : "false"
> > }
> >
> > Error as below:
> >
> >
> >
> {"status":{"id":"COMPLETED"},"operation":{"failure-cause":{"class":"java.util.concurrent.CompletionException","stack-trace":"java.util.concurrent.CompletionException:
> > java.util.concurrent.CompletionException:
> > org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed to
> > trigger savepoint. Decline reason: An Exception occurred while triggering
> > the checkpoint.\n\tat
> >
> >
> org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:972)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
> >
> >
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)\n\tat
> >
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)\n\tat
> >
> >
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)\n\tat
> > akka.actor.Actor$class.aroundReceive(Actor.scala:502)\n\tat
> > akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)\n\tat
> > akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)\n\tat
> > akka.actor.ActorCell.invoke(ActorCell.scala:495)\n\tat
> > akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)\n\tat akka.dis
> >
>