beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-2641) Improve discoverability of TextIO.readAll() as a replacement of TextIO.read() for large globs
Date Wed, 19 Jul 2017 17:56:00 GMT
Eugene Kirpichov created BEAM-2641:
--------------------------------------

             Summary: Improve discoverability of TextIO.readAll() as a replacement of TextIO.read()
for large globs
                 Key: BEAM-2641
                 URL: https://issues.apache.org/jira/browse/BEAM-2641
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov


TextIO.readAll() dramatically outperforms TextIO.read() when reading very large numbers of
files (hundreds of thousands or millions or more).

However, it is not obvious that this is what you should use if you have such a filepattern
in TextIO.read().

We should take a variety of measures to make it more discoverable, e.g.:

* Add a parameter to TextIO.read(), like "withHintManyFiles()"
* Log something suggesting the use of that hint when splitting TextIO if the filepattern is
very large
* Improve documentation
* Post something on StackOverflow about this



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message