flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Checking for existance of output directory/files before running a batch job
Date Mon, 22 Aug 2016 08:55:09 GMT
Yes, that did the trick. Thanks.
I was using a relative path without any FS specification.
So my path was "foo" and on the cluster this resolves to
"hdfs:///user/nbasjes/foo"
Locally this resolved to "file:///home/nbasjes/foo" and hence the mismatch
I was looking at.

For now I can work with this fine.

Yet I think having a method on the ExecutionEnvironment instance
'getFileSystem()' that would return me the actual filesystem against which
my job "is going to be executed" would solve this in an easier way. That
way I can use a relative path (i.e. "foo") and run it anywhere (local,
Yarn, Mesos, etc.) without any problems.

What do you guys think?
Is this desirable? Possible?

Niels.



On Fri, Aug 19, 2016 at 3:22 PM, Robert Metzger <rmetzger@apache.org> wrote:

> Ooops. Looks like Google Mail / Apache / the internet needs 13 minutes to
> deliver an email.
> Sorry for double answering.
>
> On Fri, Aug 19, 2016 at 3:07 PM, Maximilian Michels <mxm@apache.org>
> wrote:
>
>> HI Niels,
>>
>> Have you tried specifying the fully-qualified path? The default is the
>> local file system.
>>
>> For example, hdfs:///path/to/foo
>>
>> If that doesn't work, do you have the same Hadoop configuration on the
>> machine where you test?
>>
>> Cheers,
>> Max
>>
>> On Thu, Aug 18, 2016 at 2:02 PM, Niels Basjes <Niels@basjes.nl> wrote:
>> > Hi,
>> >
>> > I have a batch job that I run on yarn that creates files in HDFS.
>> > I want to avoid running this job at all if the output already exists.
>> >
>> > So in my code (before submitting the job into yarn-session) I do this:
>> >
>> >     String directory = "foo";
>> >
>> >     Path directory = new Path(directoryName);
>> >     FileSystem fs = directory.getFileSystem();
>> >
>> >     if (!fs.exists(directory)) {
>> >
>> >         // run the job
>> >
>> >     }
>> >
>> > What I found is that this code apparently checks the 'wrong' file
>> system. (I
>> > always get 'false' even if it exists in hdfs)
>> >
>> > I checked the API of the execution environment yet I was unable to get
>> the
>> > 'correct' filesystem from there.
>> >
>> > What is the proper way to check this?
>> >
>> >
>> > --
>> > Best regards / Met vriendelijke groeten,
>> >
>> > Niels Basjes
>>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Mime
View raw message