avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Bodkin <rbod...@thinkbiganalytics.com>
Subject Re: How to get started with examples on avro
Date Fri, 28 Jan 2011 23:43:22 GMT
The Colossal Pipe (https://github.com/ThinkBigAnalytics/colossal-pipe)
framework also supports working with Avro as its native format for Java
map-reduce, but it also lets you read in JSON or text files as input to
mappers, making it fairly easy to use for this kind of conversion job. E.g.,
the heart of the program would be just this:

ColFile inlogs = ColFile.at("/dfs/logs/json/"+hr
/*2011/01/28/03*/).of(LogFormat.class).jsonFormat();
ColFile outlogs = ColFile.at("/dfs/logs/avro/"+hr).of(Log.class);
ColPhase copy = new
ColPhase().reads(inlogs).writes(outlogs).map(IdentityMapper.class).
   groupBy("timestamp").reduce(IdentityReducer.class);

ColPipe conversion = new ColPipe(getClass()).named("log conversion");

Conversion.produces(outlogs);




You'd currently define an identity mapper and reducer (soon it will default
to those):

public static class IdentitMapper extends BaseMapper<Log, Log> {

  @Override

  public void map(Log in, Log out, ColContext<Log> context) {

    super.map(in, out, context);

  }

}



public static class IdentityReducer extends BaseReducer<Log, Log> {

  @Override

  public void reduce(Iterable<Log> in, Log out, ColContext<Log> context) {

    super.reduce(in, out, context);

  }

}

   
Ron

Ron Bodkin
CEO
Think Big Analytics
m: +1 (415) 509-2895



From:  Philip Zeyliger <philip@cloudera.com>
Reply-To:  <user@avro.apache.org>
Date:  Fri, 28 Jan 2011 13:44:42 -0800
To:  <user@avro.apache.org>
Subject:  Re: How to get started with examples on avro

Felix,

After you've figured out how to work it for your application, I do encourage
you to contribute (https://cwiki.apache.org/AVRO/how-to-contribute.html)
examples to the open source project.  We'll find a place for them!

-- Philip

On Fri, Jan 28, 2011 at 12:29 PM, felix gao <gre1600@gmail.com> wrote:
> Thanks for the quick reply.  I am interested in doing this through the java
> implementation and I would like to do it in parallel that utilizes the
> mapreduce framework.
> 
> 
> On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <qwertymaniac@gmail.com> wrote:
>> Based on the language you're targeting, have a look at its test-cases
>> available on the in the project's version control:
>> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out
>> via SVN, or via Git mirrors]
>> 
>> Another good resource on the ends of Avro (Data and RPC) is by phunt
>> at http://github.com/phunt/avro-rpc-quickstart#readme
>> 
>> I had written a python data-file centric snippet for Avro a while ago
>> at my blog; it may help if you're looking to get started with Python
>> (although it does not cover all aspects, which the functions in the
>> available test cases for lang/python do):
>> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-py
>> thon/
>> 
>> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <gre1600@gmail.com> wrote:
>>> > Hi all,
>>> > I am trying to convert a lot of our existing logs into avro format in
>>> > hadoop.  I am not sure if there are any examples to follow.
>>> > Thanks,
>>> > Felix
>> 
>> 
>> 
>> --
>> Harsh J
>> www.harshj.com <http://www.harshj.com>
> 




Mime
View raw message