incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiyan Ahmadizadeh (JIRA)" <>
Subject [jira] [Created] (CRUNCH-67) Multiple writes in a pipeline are not performed
Date Tue, 18 Sep 2012 23:46:11 GMT
Kiyan Ahmadizadeh created CRUNCH-67:

             Summary: Multiple writes in a pipeline are not performed
                 Key: CRUNCH-67
             Project: Crunch
          Issue Type: Bug
          Components: Core, Scrunch
    Affects Versions: 0.4.0
            Reporter: Kiyan Ahmadizadeh
            Assignee: Josh Wills

Consider the following simple PipelineApp (in Scala) that:
1. Reads in a text source.
2. Cleans the text of non-alphabetic characters.
3. Writes the sanitized text to a text file.
4. Computes word counts from the text.
5. Writes the word counts to a text file.

When this code is executed, the write from step 5 is performed successfully, but the write
from step 3 is not. 

object ShakesMultiWrite extends PipelineApp {

  val shakes = read(From.textFile("shakes.txt"))

  // Now let's clean-up the text
  val cleanShakes = {line =>
    val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase()

  // Count words
  val wordCounts = cleanShakes.flatMap { line =>
        .split( """\W+""") // Split the text into words.
        .filter(w => !w.isEmpty()) // Get rid of any empty words created.


  // Runs the pipeline

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message