Return-Path: Delivered-To: apmail-hadoop-avro-user-archive@minotaur.apache.org Received: (qmail 33732 invoked from network); 22 Apr 2010 03:20:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Apr 2010 03:20:21 -0000 Received: (qmail 48216 invoked by uid 500); 22 Apr 2010 03:20:21 -0000 Delivered-To: apmail-hadoop-avro-user-archive@hadoop.apache.org Received: (qmail 48032 invoked by uid 500); 22 Apr 2010 03:20:19 -0000 Mailing-List: contact avro-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: avro-user@hadoop.apache.org Delivered-To: mailing list avro-user@hadoop.apache.org Received: (qmail 48024 invoked by uid 99); 22 Apr 2010 03:20:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 03:20:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [216.168.135.169] (HELO starfish.geekisp.com) (216.168.135.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 03:20:12 +0000 Received: (qmail 6860 invoked by uid 1003); 22 Apr 2010 03:19:51 -0000 Received: from localhost (HELO kiwi.sharlinx.com) (tyler@monkeypox.org@127.0.0.1) by mail.geekisp.com with SMTP; 22 Apr 2010 03:19:50 -0000 Date: Wed, 21 Apr 2010 20:19:42 -0700 From: "R. Tyler Ballance" To: avro-user@hadoop.apache.org Subject: Re: Using avro with hadoop streaming Message-ID: <20100422031942.GB28156@kiwi.sharlinx.com> References: <20100414202145.GC15366@kiwi.sharlinx.com> <4BCF733A.80807@apache.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0eh6TmSyL6TZE2Uz" Content-Disposition: inline In-Reply-To: <4BCF733A.80807@apache.org> User-Agent: Mutt/1.5.20 (2009-06-14) X-Virus-Checked: Checked by ClamAV on apache.org --0eh6TmSyL6TZE2Uz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, 21 Apr 2010, Doug Cutting wrote: > R. Tyler Ballance wrote: > >Is hadoop streaming support actually /working/ in trunk? >=20 > Hadoop Streaming access to Avro data? No. Hadoop Streaming is > primarily intended for textual, CSV-style data. >=20 > To better integrate languages Avro data into Perl, Python and Ruby > mapreduce programs, we hope to builds something like Hadoop Pipes. >=20 > https://issues.apache.org/jira/browse/AVRO-512 >=20 > I hope to work on this in the coming weeks. Ah, this rings a bit clearer to me, mind you I'm a hadidiot, I'm more into generating the avro datas (and the RPC!). I'll follow the ticket, looking forward to seeing that going in. >=20 > AVRO-493 only provides Avro data to Java mapreduce programs. The > best documentation for it currently are its unit test source code. >=20 > http://tinyurl.com/yz8bd22 > http://tinyurl.com/2a3xbu8 Handy links, I don't think we're going to invest any time in writing anythi= ng other than Python code for the time being. Until you have the chance to cra= nk through #512, our intermediary solution has been to pre-process avro logs, pulling out the schema into a separate file and dumping it to a textual JSON file suitable for streaming into hadoop. Cheers, -R. Tyler Ballance -------------------------------------- Jabber: rtyler@jabber.org GitHub: http://github.com/rtyler Identica: http://identi.ca/dero Twitter: http://twitter.com/agentdero Blog: http://unethicalblogger.com --0eh6TmSyL6TZE2Uz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) iEYEARECAAYFAkvPwE4ACgkQFCbH3D9R4W8i/gCfbtMQTMDPITWtWc4svOwt+0la 134AoLKzOir7fbGhuTF9WIcG8QIAWQHR =bf2m -----END PGP SIGNATURE----- --0eh6TmSyL6TZE2Uz--