flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Recursive Traversal of the Input Path Directory, Not working
Date Tue, 27 Jun 2017 10:16:46 GMT
Hi,

Hadoop FileInputFormats (by default) also include hidden files (files starting with “.”
or “_”). You can override this behaviour in Flink by subclassing TextInputFormat and overriding
the accept() method. You can use a custom input format with ExecutionEnvironment.readFile().

Regarding BucketingSink, you can change both the prefixes and suffixes of the various files
using configuration methods.

Best,
Aljoscha

> On 27. Jun 2017, at 11:53, Adarsh Jain <eradarshjain@gmail.com> wrote:
> 
> Thanks Stefan, my colleague Shashank has filed a bug for the same in jira
> 
> https://issues.apache.org/jira/browse/FLINK-6993 <https://issues.apache.org/jira/browse/FLINK-6993>
> 
> Regards,
> Adarsh
> 
> On Fri, Jun 23, 2017 at 8:19 PM, Stefan Richter <s.richter@data-artisans.com <mailto:s.richter@data-artisans.com>>
wrote:
> Hi,
> 
> I suggest that you simply open an issue for this in our jira, describing the improvement
idea. That should be the fastest way to get this changed.
> 
> Best,
> Stefan
> 
>> Am 23.06.2017 um 15:08 schrieb Adarsh Jain <eradarshjain@gmail.com <mailto:eradarshjain@gmail.com>>:
>> 
>> Hi Stefan,
>> 
>> I think I found the problem, try it with a file which starts with underscore in the
name like "_part-1-0.csv".
>> 
>> While saving Flink appends a "_" to the file name however while reading at folder
level it does not pick those files.
>> 
>> Can you suggest if we can do a setting so that it does not pre appends underscore
while saving a file.
>> 
>> Regards,
>> Adarsh
>> 
>> On Fri, Jun 23, 2017 at 3:24 PM, Stefan Richter <s.richter@data-artisans.com <mailto:s.richter@data-artisans.com>>
wrote:
>> No, that doesn’t make a difference and also works.
>> 
>>> Am 23.06.2017 um 11:40 schrieb Adarsh Jain <eradarshjain@gmail.com <mailto:eradarshjain@gmail.com>>:
>>> 
>>> I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this
be the problem?
>>> 
>>> With "import org.apache.flink.api.scala.ExecutionEnvironment"
>>> 
>>> Using scala in my program.
>>> 
>>> Regards,
>>> Adarsh 
>>> 
>>> On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter <s.richter@data-artisans.com
<mailto:s.richter@data-artisans.com>> wrote:
>>> I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()"
and exchanged the string with a local directory for some test files that I created. No other
changes.
>>> 
>>>> Am 23.06.2017 um 11:25 schrieb Adarsh Jain <eradarshjain@gmail.com <mailto:eradarshjain@gmail.com>>:
>>>> 
>>>> Hi Stefan,
>>>> 
>>>> Thanks for your efforts in checking the same, still doesn't work for me.

>>>> 
>>>> Can you copy paste the code you used maybe I am doing some silly mistake
and am not able to figure out the same.
>>>> 
>>>> Thanks again.
>>>> 
>>>> Regards,
>>>> Adarsh
>>>> 
>>>> 
>>>> On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter <s.richter@data-artisans.com
<mailto:s.richter@data-artisans.com>> wrote:
>>>> Hi,
>>>> 
>>>> I tried this out on the current master and the 1.3 release and both work
for me everything works exactly as expected, for file names, a directory, and even nested
directories.
>>>> 
>>>> Best,
>>>> Stefan
>>>> 
>>>>> Am 22.06.2017 um 21:13 schrieb Adarsh Jain <eradarshjain@gmail.com
<mailto:eradarshjain@gmail.com>>:
>>>>> 
>>>>> Hi Stefan,
>>>>> 
>>>>> Yes your understood right, when I give full path till the filename it
works fine however when I give path till 
>>>>> directory it does not read the data, doesn't print any exceptions too
... I am also not sure why it is behaving like this.
>>>>> 
>>>>> Should be easily replicable, in case you can try. Will be really helpful.
>>>>> 
>>>>> Regards,
>>>>> Adarsh
>>>>> 
>>>>> On Thu, Jun 22, 2017 at 9:00 PM, Stefan Richter <s.richter@data-artisans.com
<mailto:s.richter@data-artisans.com>> wrote:
>>>>> Hi,
>>>>> 
>>>>> I am not sure I am getting the problem right: the code works if you use
a file name, but it does not work for directories? What exactly is not working? Do you get
any exceptions?
>>>>> 
>>>>> Best,
>>>>> Stefan
>>>>> 
>>>>>> Am 22.06.2017 um 17:01 schrieb Adarsh Jain <eradarshjain@gmail.com
<mailto:eradarshjain@gmail.com>>:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am trying to use "Recursive Traversal of the Input Path Directory"
in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working
fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html
<https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html>
>>>>>> 
>>>>>> import org.apache.flink.api.java.utils.ParameterTool
>>>>>> import org.apache.flink.api.java.{DataSet, ExecutionEnvironment}
>>>>>> import org.apache.flink.configuration.Configuration
>>>>>> 
>>>>>> val config = new Configuration
>>>>>>     config.setBoolean("recursive.file.enumeration",true)
>>>>>> 
>>>>>> val featuresSource: String = "file:///Users/adarsh/Documents/testData/featurecsv/31c710ac40/2017/06/22
<>"
>>>>>> 
>>>>>> val testInput = env.readTextFile(featuresSource).withParameters(config)
>>>>>> testInput.print()
>>>>>> 
>>>>>> Please guide how to fix this.
>>>>>> 
>>>>>> Regards,
>>>>>> Adarsh
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 


Mime
View raw message