apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2103) scanner issues in FileSplitterInput class
Date Tue, 31 May 2016 22:25:12 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308763#comment-15308763
] 

ASF GitHub Bot commented on APEXMALHAR-2103:
--------------------------------------------

Github user DT-Priyanka commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/300#discussion_r65274187
  
    --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/FileSplitterInput.java ---
    @@ -375,11 +374,18 @@ public void run()
                 lastScannedInfo = null;
                 numDiscoveredPerIteration = 0;
                 for (String afile : files) {
    -              String filePath = new File(afile).getAbsolutePath();
    -              LOG.debug("Scan started for input {}", filePath);
    -              Map<String, Long> lastModifiedTimesForInputDir;
    -              lastModifiedTimesForInputDir = referenceTimes.get(filePath);
    -              scan(new Path(afile), null, lastModifiedTimesForInputDir);
    +              Path filePath = new Path(afile);
    +              LOG.debug("Scan started for input {}", filePath.toString());
    +              Map<String, Long> lastModifiedTimesForInputDir = null;
    +              if (fs.exists(filePath)) {
    +                FileStatus fileStatus = fs.getFileStatus(filePath);
    +                if (fileStatus.isDirectory()) {
    +                  lastModifiedTimesForInputDir = referenceTimes.get(fileStatus.getPath().toString());
    +                } else {
    +                  lastModifiedTimesForInputDir = referenceTimes.get(fileStatus.getPath().getParent().toString());
    --- End diff --
    
    This is not right, in case user has given input as,
    /home/myDir, /home/myDir/file1.txt, the scan of second input i.e. /home/myDir/file1.txt
will overwrite the reference times for input /home/myDir.


> scanner issues in FileSplitterInput class
> -----------------------------------------
>
>                 Key: APEXMALHAR-2103
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2103
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: Chaitanya
>            Assignee: Chaitanya
>
> Issue: FileSplitter continuously emitting filemetadata even though there is  a single
file.
> Observation: For the same file, While updating and accessing the referenceTimes map in
FIleSplitterInput and TimeBasedScanner, the Keys are different. Because of this, the oldestTimeModification
is always null in TimeBasedScanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message