beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1892) Log process during size estimation in filebasedsource
Date Thu, 06 Apr 2017 18:33:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959497#comment-15959497
] 

ASF GitHub Bot commented on BEAM-1892:
--------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/beam/pull/2445


> Log process during size estimation in filebasedsource
> -----------------------------------------------------
>
>                 Key: BEAM-1892
>                 URL: https://issues.apache.org/jira/browse/BEAM-1892
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Sourabh Bajaj
>            Assignee: Sourabh Bajaj
>
> http://stackoverflow.com/questions/43095445/how-to-iterate-all-files-in-google-cloud-storage-to-be-used-as-dataflow-input
> The user mentioned that there was no output and a huge delay in submitting the pipeline.
The file size estimation process can be slow for really large datasets and this reports no
process to the end user right now. We should be logging process and thresholding the pre submission
size estimation as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message