avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "R. Tyler Ballance" <ty...@monkeypox.org>
Subject Re: Using avro with hadoop streaming
Date Thu, 22 Apr 2010 03:19:42 GMT

On Wed, 21 Apr 2010, Doug Cutting wrote:

> R. Tyler Ballance wrote:
> >Is hadoop streaming support actually /working/ in trunk?
> Hadoop Streaming access to Avro data?  No.  Hadoop Streaming is
> primarily intended for textual, CSV-style data.
> To better integrate languages Avro data into Perl, Python and Ruby
> mapreduce programs, we hope to builds something like Hadoop Pipes.
>   https://issues.apache.org/jira/browse/AVRO-512
> I hope to work on this in the coming weeks.

Ah, this rings a bit clearer to me, mind you I'm a hadidiot, I'm more
into generating the avro datas (and the RPC!).

I'll follow the ticket, looking forward to seeing that going in.

> AVRO-493 only provides Avro data to Java mapreduce programs.  The
> best documentation for it currently are its unit test source code.
> http://tinyurl.com/yz8bd22
> http://tinyurl.com/2a3xbu8

Handy links, I don't think we're going to invest any time in writing anything
other than Python code for the time being. Until you have the chance to crank
through #512, our intermediary solution has been to pre-process avro logs,
pulling out the schema into a separate file and dumping it to a textual JSON
file suitable for streaming into hadoop.

-R. Tyler Ballance
  Jabber: rtyler@jabber.org
  GitHub: http://github.com/rtyler
Identica: http://identi.ca/dero
 Twitter: http://twitter.com/agentdero
    Blog: http://unethicalblogger.com

View raw message