incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CRUNCH-67) Multiple writes in a pipeline are not performed
Date Wed, 19 Sep 2012 21:39:07 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Wills resolved CRUNCH-67.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.4.0

Solid-- just committed to master. Thanks for banging on the new stuff, I'm sure we'll find
some more stuff before  the next release.
                
> Multiple writes in a pipeline are not performed
> -----------------------------------------------
>
>                 Key: CRUNCH-67
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-67
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, Scrunch
>    Affects Versions: 0.4.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-67.patch, ShakesMultiWrite.scala
>
>
> Consider the following simple PipelineApp (in Scala) that:
> 1. Reads in a text source.
> 2. Cleans the text of non-alphabetic characters.
> 3. Writes the sanitized text to a text file.
> 4. Computes word counts from the text.
> 5. Writes the word counts to a text file.
> When this code is executed, the write from step 5 is performed successfully, but the
write from step 3 is not. 
> object ShakesMultiWrite extends PipelineApp {
>   val shakes = read(From.textFile("shakes.txt"))
>   // Now let's clean-up the text
>   val cleanShakes = shakes.map {line =>
>     val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase()
>     cleanText
>   }
>   cleanShakes.write(To.textFile("shakesText/cleanShakes"))
>   // Count words
>   val wordCounts = cleanShakes.flatMap { line =>
>       line
>         .split( """\W+""") // Split the text into words.
>         .filter(w => !w.isEmpty()) // Get rid of any empty words created.
>   }.count()
>   wordCounts.write(To.textFile("shakesText/wordCounts"))
>   // Runs the pipeline
>   run()
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message