incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiyan Ahmadizadeh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-67) Multiple writes in a pipeline are not performed
Date Tue, 18 Sep 2012 23:46:16 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kiyan Ahmadizadeh updated CRUNCH-67:
------------------------------------

    Description: 
Consider the following simple PipelineApp (in Scala) that:
1. Reads in a text source.
2. Cleans the text of non-alphabetic characters.
3. Writes the sanitized text to a text file.
4. Computes word counts from the text.
5. Writes the word counts to a text file.

When this code is executed, the write from step 5 is performed successfully, but the write
from step 3 is not. 

object ShakesMultiWrite extends PipelineApp {

  val shakes = read(From.textFile("shakes.txt"))

  // Now let's clean-up the text
  val cleanShakes = shakes.map {line =>
    val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase()
    cleanText
  }
  cleanShakes.write(To.textFile("shakesText/cleanShakes"))

  // Count words
  val wordCounts = cleanShakes.flatMap { line =>
      line
        .split( """\W+""") // Split the text into words.
        .filter(w => !w.isEmpty()) // Get rid of any empty words created.
  }.count()

  wordCounts.write(To.textFile("shakesText/wordCounts"))

  // Runs the pipeline
  run()
}

  was:
Consider the following simple PipelineApp (in Scala) that:
1. Reads in a text source.
2. Cleans the text of non-alphabetic characters.
3. Writes the sanitized text to a text file.
4. Computes word counts from the text.
5. Writes the word counts to a text file.

When this code is executed, the write from step 5 is performed successfully, but the write
from step 3 is not. 

{code}
object ShakesMultiWrite extends PipelineApp {

  val shakes = read(From.textFile("shakes.txt"))

  // Now let's clean-up the text
  val cleanShakes = shakes.map {line =>
    val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase()
    cleanText
  }
  cleanShakes.write(To.textFile("shakesText/cleanShakes"))

  // Count words
  val wordCounts = cleanShakes.flatMap { line =>
      line
        .split( """\W+""") // Split the text into words.
        .filter(w => !w.isEmpty()) // Get rid of any empty words created.
  }.count()

  wordCounts.write(To.textFile("shakesText/wordCounts"))

  // Runs the pipeline
  run()
}
{code}

    
> Multiple writes in a pipeline are not performed
> -----------------------------------------------
>
>                 Key: CRUNCH-67
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-67
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, Scrunch
>    Affects Versions: 0.4.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>
> Consider the following simple PipelineApp (in Scala) that:
> 1. Reads in a text source.
> 2. Cleans the text of non-alphabetic characters.
> 3. Writes the sanitized text to a text file.
> 4. Computes word counts from the text.
> 5. Writes the word counts to a text file.
> When this code is executed, the write from step 5 is performed successfully, but the
write from step 3 is not. 
> object ShakesMultiWrite extends PipelineApp {
>   val shakes = read(From.textFile("shakes.txt"))
>   // Now let's clean-up the text
>   val cleanShakes = shakes.map {line =>
>     val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase()
>     cleanText
>   }
>   cleanShakes.write(To.textFile("shakesText/cleanShakes"))
>   // Count words
>   val wordCounts = cleanShakes.flatMap { line =>
>       line
>         .split( """\W+""") // Split the text into words.
>         .filter(w => !w.isEmpty()) // Get rid of any empty words created.
>   }.count()
>   wordCounts.write(To.textFile("shakesText/wordCounts"))
>   // Runs the pipeline
>   run()
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message