hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonardo Urbina <lurb...@mit.edu>
Subject Re: Hadoop Serialization: Avro
Date Sun, 27 Nov 2011 03:48:50 GMT
Thanks, I will send the question to that last as well,

Best,
-Leo

Sent from my phone

On Nov 26, 2011, at 7:32 PM, Brock Noland <brock@cloudera.com> wrote:

> Hi,
>
> Depending on the response you get here, you might also post the
> question separately on avro-user.
>
> On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbina <lurbina@mit.edu> wrote:
>> Hey everyone,
>>
>> First time posting to the list. I'm currently writing a hadoop job that
>> will run daily and whose output will be part of the part of the next day's
>> input. Also, the output will potentially be read by other programs for
>> later analysis.
>>
>> Since my program's output is used as part of the next day's input, it would
>> be nice if it was stored in some binary format that is easy to read the
>> next time around. But this format also needs to be readable by other
>> outside programs, not necessarily written in Java. After searching for a
>> while it seems that Avro is what I want to be using. In any case, I have
>> been looking around for a while and I can't seem to find a single example
>> of how to use Avro within a Hadoop job.
>>
>> It seems that in order to use Avro I need to change the io.serializations
>> value, however I don't know which value should be specified. Furthermore, I
>> found that there are classes Avro{Input,Output}Format but these use a
>> series of other Avro classes which, as far as I understand, seem need the
>> use of other Avro classes such as AvroWrapper, AvroKey, AvroValue, and as
>> far as I am concerned Avro* (with * replaced with pretty much any Hadoop
>> class name). It seems however that these are used so that the Avro format
>> is used throughout the Hadoop process to pass objects around.
>>
>> I just want to use Avro to save my output and read it again as input next
>> time around. So far I have been using SequenceFile{Input,Output}Format, and
>> have implemented the Writable interface in the relevant classes, however
>> this is not portable to other languages. Is there a way to use Avro without
>> a substantial rewrite (using Avro* classes) of my Hadoop job? Thanks in
>> advance,
>>
>> Best,
>> -Leo
>>
>> --
>> Leo Urbina
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Department of Mathematics
>> lurbina@mit.edu
>>

Mime
View raw message