avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Healy <the...@bnl.gov>
Subject Re: Output from AVRO mapper
Date Sat, 22 Dec 2012 18:33:13 GMT
<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Thanks Russell. This looks like a lot easier solution which I will
    look at more carefully in the near future. But at this point I don't
    want to walk away from the Java M/R solution just because I can't
    work it out. I know it works - I just am missing something basic in
    my understanding.<br>
    <br>
    -Terry<br>
    <br>
    P.S. Cool font too.<br>
    <br>
    <div class="moz-cite-prefix">On 12/21/12 7:42 PM, Russell Jurney
      wrote:<br>
    </div>
    <blockquote cite="mid:-5487704440750579907@unknownmsgid" type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div>I don't mean to harp, but this is a few lines in Pig:</div>
      <pre class="pig" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:10px;padding-right:10px;padding-bottom:10px;padding-left:10px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font:normal
normal normal 1em/normal 'andale mono','lucida console',monospace;vertical-align:baseline;display:block;width:auto;clear:none;overflow-x:visible;overflow-y:visible"><span
class="Apple-style-span" style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><div>/*
Load Avro jars and define shortcut */</div><div>register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar </div>
<div>register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar </div><div>register
/me/pig/contrib/piggybank/java/piggybank.jar </div><div>define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();</div><div>

</div><div>/* Load Avros */</div><div>input = load 'my.avro' using
AvroStorage();</div><div>
</div><div>/* Verify input */</div><div>describe input;</div><div>Illustrate
input;</div><div>
</div><div>/* Convert Avros to JSON */</div>
<div>store input into 'my.json' using com.twitter.elephantbird.pig.store.JsonStorage();</div><div>store
input into 'my.json.lzo' using com.twitter.elephantbird.pig.store.LzoJsonStorage();</div><div>
</div>
<div>/* Convert simple Avros to TSV */</div><div>store input into 'my.tsv';</div><div>
</div><div>/* Convert Avros to SequenceFiles */</div>REGISTER '/path/to/elephant-bird.jar';<div> store
 input into 'my.seq' using com.twitter.elephantbird.pig.store.SequenceFileStorage(</div>
<div>    /* example: */</div><div>    '-c com.twitter.elephantbird.pig.util.IntWritableConverter',</div><div> 
  '-c com.twitter.elephantbird.pig.util.TextConverter'</div><div>)<span class="Apple-style-span"
style="">;</span></div>
<div>
</div><div>/* Convert Avros to Protobufs */</div><div>store input
into 'input.protobuf’ using com.twitter.elephantbird.examples.proto.pig.store.ProfileProtobufB64LinePigStorage();</div><div>
</div><div>/* Convert Avros to a Lucene Index */</div>
store input into 'input.lucene' using LuceneIndexStorage('com.example.MyPigLuceneIndexOutputFormat');</span><font
class="Apple-style-span" face="Helvetica"><span class="Apple-style-span" style="white-space:normal">

</span></font></pre>
      <div><span class="Apple-style-span"
style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal">There
          are also drivers for most NoSQLish databases...</span></div>
      <div><span class="Apple-style-span"
style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><br>
        </span></div>
      <div>Russell Jurney <a moz-do-not-send="true"
          href="http://datasyndrome.com">http://datasyndrome.com</a></div>
      <div><br>
        On Dec 20, 2012, at 9:33 AM, Terry Healy &lt;<a
          moz-do-not-send="true" href="mailto:thealy@bnl.gov">thealy@bnl.gov</a>&gt;
        wrote:<br>
        <br>
      </div>
      <blockquote type="cite">
        <div><span>I'm just getting started using AVRO within Map/Reduce
            and trying to</span><br>
          <span>convert some existing non-AVRO code to use AVRO input.
            So far the data</span><br>
          <span>that previously was stored in tab delimited files has
            been converted to</span><br>
          <span>.avro successfully as checked with avro-tools.</span><br>
          <span></span><br>
          <span>Where I'm getting hung up extending beyond my book-based
            examples is in</span><br>
          <span>attempting to read from AVRO (using generic records)
            where the mapper</span><br>
          <span>output is NOT in AVRO format. I can't seem to reconcile
            extending</span><br>
          <span>AvroMapper and NOT using AvroCollector.</span><br>
          <span></span><br>
          <span>Here are snippets of code that show my non-AVRO M/R code
            and my</span><br>
          <span>[failing] attempts to make this change. If anyone can
            help me along it</span><br>
          <span>would be very much appreciated.</span><br>
          <span></span><br>
          <span>-Terry</span><br>
          <span></span><br>
          <span>&lt;code&gt;</span><br>
          <span>Pre-Avro version: (Works fine with .tsv input format)</span><br>
          <span></span><br>
          <span>    public static class HdFlowMapper extends
            MapReduceBase</span><br>
          <span>            implements Mapper&lt;Text, HdFlowWritable,
            LongPair,</span><br>
          <span>HdFlowWritable&gt; {</span><br>
          <span></span><br>
          <span></span><br>
          <span>        @Override</span><br>
          <span>        public void map(Text key, HdFlowWritable value,</span><br>
          <span>                OutputCollector&lt;LongPair,
            HdFlowWritable&gt; output,</span><br>
          <span>                Reporter reporter) throws IOException
{</span><br>
          <span></span><br>
          <span>        ...//</span><br>
          <span>                outKey = new LongPair(value.getSrcIp(),
            value.getFirst());</span><br>
          <span></span><br>
          <span>                HdFlowWritable outValue = value; // pass
            it all through</span><br>
          <span>                output.collect(outKey, outValue);</span><br>
          <span>    }</span><br>
          <span></span><br>
          <span></span><br>
          <span></span><br>
          <span>AVRO attempt:</span><br>
          <span></span><br>
          <span></span><br>
          <span>        conf.setOutputFormat(TextOutputFormat.class);</span><br>
          <span>        conf.setOutputKeyClass(LongPair.class);</span><br>
          <span>
                   conf.setOutputValueClass(AvroFlowWritable.class);</span><br>
          <span></span><br>
          <span>        SCHEMA = new
            Schema.Parser().parse(NetflowSchema);</span><br>
          <span>        AvroJob.setInputSchema(conf, SCHEMA);</span><br>
          <span>        //AvroJob.setOutputSchema(conf, SCHEMA);</span><br>
          <span>        AvroJob.setMapperClass(conf,
            AvroFlowMapper.class);</span><br>
          <span>        AvroJob.setReducerClass(conf,
            AvroFlowReducer.class);</span><br>
          <span></span><br>
          <span>....</span><br>
          <span></span><br>
          <span>         public static class AvroFlowMapper&lt;K&gt;
            extends AvroMapper&lt;K,</span><br>
          <span>OutputCollector&gt; {</span><br>
          <span></span><br>
          <span></span><br>
          <span>        @Override    </span><br>
          <span>    ** IDE: "Method does not override or implement a
            method from a supertype"</span><br>
          <span></span><br>
          <span>        public void map(K datum,
            OutputCollector&lt;LongPair,</span><br>
          <span>AvroFlowWritable&gt; collector, Reporter reporter)
            throws IOException {</span><br>
          <span></span><br>
          <span></span><br>
          <span>            GenericRecord record = (GenericRecord)
            datum;</span><br>
          <span>            afw = new AvroFlowWritable(record);</span><br>
          <span>        // ...</span><br>
          <span>            collector.collect(outKey, afw);</span><br>
          <span>}</span><br>
          <span></span><br>
          <span>&lt;/code&gt;</span><br>
          <span></span><br>
        </div>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>

Mime
View raw message