Hi Jim,

What are your checkpointing settings? Are you checkpointing to a distributed file system, such as HDFS or S3 or the local file system. The latter should not be used in a production setting and I would not expect this to work properly. (Except if the local filesystem is actually a network mounted file system)

Best,
Aljoscha

On 15. May 2017, at 17:05, Jim Langston <jlangston@resolutebi.com> wrote:

Hi all,
 
I have a long running , streaming app saving checkpoints to
the file system. 
 
What is the layout of the checkpoint directory ? My current
checkpoint directory has >2000 directories in it , similar to this:
 
chk-4645
 
 
Also, the directory has grown to >3GB
 
I have a small cluster, and all were started at the same time, nothing
has been restarted, but this is occurring one of the nodes, the others have
about the same number of directories in the checkpoint directory, but
not nearly as large.
 
 
Why are there so many chk-xxxx directories ? And why can they become
so large ? Is there something I should be setting in the yaml file ?
 
I was going to just remove them , but it just struck me as odd that there
are so many …
 
 
Thanks
 
Jim