incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiyan Ahmadizadeh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-67) Multiple writes in a pipeline are not performed
Date Tue, 18 Sep 2012 23:46:11 GMT
Kiyan Ahmadizadeh created CRUNCH-67:
---------------------------------------

             Summary: Multiple writes in a pipeline are not performed
                 Key: CRUNCH-67
                 URL: https://issues.apache.org/jira/browse/CRUNCH-67
             Project: Crunch
          Issue Type: Bug
          Components: Core, Scrunch
    Affects Versions: 0.4.0
            Reporter: Kiyan Ahmadizadeh
            Assignee: Josh Wills


Consider the following simple PipelineApp (in Scala) that:
1. Reads in a text source.
2. Cleans the text of non-alphabetic characters.
3. Writes the sanitized text to a text file.
4. Computes word counts from the text.
5. Writes the word counts to a text file.

When this code is executed, the write from step 5 is performed successfully, but the write
from step 3 is not. 

{code}
object ShakesMultiWrite extends PipelineApp {

  val shakes = read(From.textFile("shakes.txt"))

  // Now let's clean-up the text
  val cleanShakes = shakes.map {line =>
    val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase()
    cleanText
  }
  cleanShakes.write(To.textFile("shakesText/cleanShakes"))

  // Count words
  val wordCounts = cleanShakes.flatMap { line =>
      line
        .split( """\W+""") // Split the text into words.
        .filter(w => !w.isEmpty()) // Get rid of any empty words created.
  }.count()

  wordCounts.write(To.textFile("shakesText/wordCounts"))

  // Runs the pipeline
  run()
}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message