flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shashank Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-6993) Not reading recursive files in Batch by using readTextFile when file name contains _ in starting.
Date Fri, 23 Jun 2017 13:36:06 GMT
Shashank Agarwal created FLINK-6993:
---------------------------------------

             Summary: Not reading recursive files in Batch by using readTextFile when file
name contains _ in starting.
                 Key: FLINK-6993
                 URL: https://issues.apache.org/jira/browse/FLINK-6993
             Project: Flink
          Issue Type: Bug
          Components: Batch Connectors and Input/Output Formats
    Affects Versions: 1.3.0
            Reporter: Shashank Agarwal
            Priority: Critical
             Fix For: 1.3.2


When i try to read files from a folder using using readTextFile in batch and using recursive.file.enumeration,
It's not reading the files when file name contains _ in starting. But when i removed the _
from start it's working fine. 

It also working fine in case of direct path of single file not working with Directory path.
For replicate the issue :

{code}
import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
import org.apache.flink.configuration.Configuration

object CSVMerge {

  def main(args: Array[String]): Unit = {

    val env = ExecutionEnvironment.getExecutionEnvironment

    // create a configuration object
    val parameters = new Configuration

    // set the recursive enumeration parameter
    parameters.setBoolean("recursive.file.enumeration", true)

    val stream = env.readTextFile("file:///Users/data")
      .withParameters(parameters)

    stream.print()

  }
}
{code}

When you put 2-3 Text files with name like 1.txt, 2.txt etc. in data folder it's working fine.
But when we put _1.txt, _2.txt file it's not working.

Flink BucketingSink in stream by default put _ before the file names. So unable to read Sinked
files from DataStream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message