beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-1294) Long running UnboundedSource Readers via Broadcasts
Date Sat, 21 Jan 2017 21:05:26 GMT
Amit Sela created BEAM-1294:
-------------------------------

             Summary: Long running UnboundedSource Readers via Broadcasts
                 Key: BEAM-1294
                 URL: https://issues.apache.org/jira/browse/BEAM-1294
             Project: Beam
          Issue Type: Improvement
          Components: runner-spark
            Reporter: Amit Sela
            Assignee: Amit Sela


When reading from an UnboundedSource, current implementation will cause each split to create
a new Reader every micro-batch.

As long as the overhead of creating a reader is relatively low, it's reasonable (though I'd
still be happy to get rid of), but in cases where the creation overhead is large it becomes
unreasonable forcing large batches.

One way to solve this could be to create a pool of lazy-init readers to serve each executor,
maybe via Broadcast variables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message