avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From selvi k <gridsngat...@gmail.com>
Subject Getting started with Avro + Reading from an Avro formatted file
Date Tue, 24 Jan 2012 15:31:36 GMT
Hello All,

I would like some suggestions on where I can start in the Avro project.

I want to be able to read from an Avro formatted log file (specifically the
History Log file created at the end of a Hadoop job) and create a Comma
Separated file of certain log entries. I need a csv file because this is
the format that is accepted by post processing software I am working with
(eg: Matlab).

Initially I was using a BASH script to grep and awk from this file and
create my CSV file because I needed a very few values from it, and a quick
script just worked. I didn't try to get to know what format the log file
was in and utilize that. (my bad!)  Now that I need to be scaling up and
want to have a reliable way to parse, I would like to try and do it the
right way.

My question is this: For the above goal, could you please guide me with
steps I can follow - such as reading material and libraries I could try to
use. As I go through the Quick Start Guide and FAQ, I see that a lot of the
information here is geared to someone who wants to use the data
serialization and RPC functionality provided by Avro. Given that I only
want to be able to "read", where may I start?

I can comfortably script with BASH and Perl. Given that I only see support
for Java, Python and Ruby, I think I can take this as as opportunity to
learn Python and get up to speed.

Thanks a lot.


View raw message