avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
Date Wed, 18 Aug 2010 17:49:50 GMT
On Wed, Aug 18, 2010 at 11:07 PM, Doug Cutting <cutting@apache.org> wrote:
> On 08/18/2010 10:18 AM, ey-chih chow wrote:
>> Thanks. But by doing this way, what kind of advantage we can get from
>> Avro?
> The Avro MapReduce API is easiest to use when both inputs and outputs are
> Avro data.
> If inputs are not Avro data, but you want to use the rest of the Avro MR
> API, then you'd need to write an InputFormat that produces an AvroWrapper<T>
> where T is a type that Avro can serialize.
> Another alternative might be to first convert your inputs to be avro data
> files.  For example, one can use Avro's 'fromtext' tool to convert
> line-oriented files into equivalent compressed, splittable, Avro data files.
>  This could be done as log files are loaded into HDFS, since this tool
> accepts Hadoop paths as output.
> We hope to add more such tools for such conversion/ingest, e.g.:
> https://issues.apache.org/jira/browse/AVRO-458
Offtopic, but is there any work being done on this already? I saw one
of them tagged with 'GSOC', so wish to know before I sink something
> We also expect that systems like Flume will produce Avro data files.
> Doug

Harsh J

View raw message