incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Barretta <>
Subject pipeline writeTextFile spitting out encoded files?
Date Tue, 12 Feb 2013 16:33:15 GMT
I'm running some simple parallelDos which emit Strings.  When I write the
resulting PCollection out using pipeline.writeTextFile(), I see garbled
garbage like:




The code (Groovy - function() is a passed in closure that does the
emitter.emit()) looks like:

collection.parallelDo( + ":" + table, new
DoFn<Pair<ColumnKey, ColumnDataArrayWritable>, String>() {
  void process(Pair<ColumnKey, ColumnDataArrayWritable> input,
Emitter<String> emitter) {
    input.second().toArray().each {
      def obj = assembler.assemble([PetalUtils.toThrift(input.first(), it)])
      function(obj, emitter)
}, Writables.strings())
crunchPipeline.writeTextFile(collection, "$outputPath/$outputDir")

It's worth noting I saw the same output when running plain word count.

Is this something that's my fault? Or the cluster, cluster compression, etc?

View raw message