beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Bradshaw <rober...@google.com>
Subject Re: Pico WordCount
Date Wed, 07 Dec 2016 19:05:14 GMT
Nice. Of course for ultimate conciseness, you should have gone with Python :)

import apache_beam as beam, re
with beam.Pipeline() as p:
  (p
   | beam.io.textio.ReadFromText("playing_cards.tsv")
   | beam.Map(lamdba s: re.split("\\W+", s))
   | beam.combiners.Count.PerElement()
   | beam.Map(lambda (w, c): "%s: %d" % (w, c))
   | beam.io.textio.WriteToText("output/stringcounts")



On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb@nanthrax.net> wrote:
> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>>
>> Perhaps we can add this to our examples.
>> Thank you Jesse. :)
>>
>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> <mailto:jb@nanthrax.net>> wrote:
>>
>>     Awesome !
>>
>>     Thanks Jesse !
>>
>>     Regards
>>     JB
>>
>>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>
>>         I wrote a post on the smallest WordCount
>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>>         could
>>         write. I go through everything line by line and talk about some
>>         of the
>>         newest DoFNs that allow you to easily run regular expressions in a
>>         distributed way.
>>
>>         Thanks,
>>
>>         Jesse
>>
>>
>>
>>     --
>>     Jean-Baptiste Onofré
>>     jbonofre@apache.org <mailto:jbonofre@apache.org>
>>     http://blog.nanthrax.net
>>     Talend - http://www.talend.com
>>
>>
>>
>>
>> --
>> Neelesh Srinivas Salian
>> Customer Operations Engineer
>>
>> *
>> *
>> *
>> *
>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Mime
View raw message