beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reece <>
Subject Grouping/splitting contents of a text file into chapters or sections
Date Mon, 31 Jul 2017 19:02:52 GMT

I'm creating a beam pipeline that counts words by chapter from a single input file formatted
as such:

CHAPTER I. Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice...
CHAPTER II. The Pool of Tears

'Curiouser and curiouser!' cried Alice (she was so much surprised, that
for the moment she quite forgot how to speak good English); 'now I'm
opening out like the largest telescope that...

Is there a way to achieve this in beam? Does it require extending BoundedSource (and if so,
does anyone have any similar examples I could work off of?) I can think of a couple ways to
do this by modifying the input file before the pipeline, but I'm interested to know if there's
a way to do this purely in beam.

Many thanks,



  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message