flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GEOFBOT <...@git.apache.org>
Subject [GitHub] flink issue #3232: [FLINK-5183] [py] Support mulitple jobs per plan file
Date Thu, 16 Feb 2017 19:42:06 GMT
Github user GEOFBOT commented on the issue:

    https://github.com/apache/flink/pull/3232
  
    > It may have worked with a smaller file, but there may be issues with heavier jobs.
    
    How silly of me. This problem had nothing to do with this pull request, with YARN, with
issues in Flink, or with the size of the input file at all. I was using `ExecutionEnvironment.from_elements`
to generate a large sequence of indexed zeroes to fill in the gaps of another indexed DataSet
with zeroes. However, when I was using large input files, I set larger parameters and generated
larger zero sequences. Because I was using `from_elements`, the client needed to send all
of those values (lots and lots of zeroes) to the runtime, which was very time-consuming and
caused the timeout. I have replaced this with a `generate_sequence` call and a map function,
which does not require sending lots and lots of values from the client to the runtime, and
the job (and this pull request) seem to work just fine.
    
    (change in question: https://github.com/quinngroup/pyflink-r1dl/commit/00a16d564bfad21fc1f4958677ada0a95fa9f088)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message