apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tushar Gosavi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2274) AbstractFileInputOperator gets killed when there are a large number of files.
Date Wed, 26 Oct 2016 06:16:58 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607573#comment-15607573
] 

Tushar Gosavi commented on APEXMALHAR-2274:
-------------------------------------------

you can take a look at FileSplitterInput which run the scanner in a separate thread and pass
the data to main operator thread using thread safe queue. You can reuse the same approach,
may be we can combine the scanner for both the operators.

> AbstractFileInputOperator gets killed when there are a large number of files.
> -----------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2274
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2274
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: Munagala V. Ramanath
>            Assignee: Matt Zhang
>
> When there are a large number of files in the monitored directory, the call to DirectoryScanner.scan()
can take a long time since it calls FileSystem.listStatus() which returns the entire list.
Meanwhile, the AppMaster deems this operator hung and restarts it which again results in the
same problem.
> It should use FileSystem.listStatusIterator() [in Hadoop 2.7.X] or FileSystem.listFiles()
[in 2.6.X] or other similar calls that return
> a remote iterator to limit the number files processed in a single call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message