Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C908E474B for ; Wed, 15 Jun 2011 17:37:09 +0000 (UTC) Received: (qmail 37017 invoked by uid 500); 15 Jun 2011 17:37:09 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 36937 invoked by uid 500); 15 Jun 2011 17:37:09 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 36929 invoked by uid 99); 15 Jun 2011 17:37:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 17:37:09 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of miki.tebeka@gmail.com designates 74.125.83.171 as permitted sender) Received: from [74.125.83.171] (HELO mail-pv0-f171.google.com) (74.125.83.171) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 17:37:05 +0000 Received: by pva4 with SMTP id 4so564541pva.30 for ; Wed, 15 Jun 2011 10:36:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=dJdhW0wQIoAKzGH4pxaw7wTPPgO4l2AAa7rqzIXORa4=; b=tVnnkgWxHKaCCcnX6LO9UfM5ActLsF3lf5Gn0CF2yA6o6kYRDQ1T16cl3NzSp1NIo/ maMzNP7xm6NFzEYhLitbTSX0A76YrfAlKojBUU67qgMAE0HJp0hgtxoDwqeHUK9gBdbc 7jjugA21c/NYgOBMJVOJwWztJBwDRPa7dIzFU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=o2ggrW3WHNVRf2C81/AY2sUd0ZvVFFcZkkhDarfmUVErHbjHC7nUccSuLcQ5iSpfZ1 sdow4PVOYDNy1KJtfsQy/VIDUj0XmCiM7VZAjiJ6iWG0+6kOwU/R85VbqqR1tq6ebSTV rk04UrdYbwLSLfQqwjdA+oYA42xslz/+pARGk= MIME-Version: 1.0 Received: by 10.68.27.170 with SMTP id u10mr518705pbg.529.1308159404936; Wed, 15 Jun 2011 10:36:44 -0700 (PDT) Received: by 10.68.42.135 with HTTP; Wed, 15 Jun 2011 10:36:44 -0700 (PDT) In-Reply-To: References: Date: Wed, 15 Jun 2011 10:36:44 -0700 Message-ID: Subject: Re: Avro and Hadoop streaming From: Miki Tebeka To: user@avro.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Found the magic (-files and -libs): jars=3Davro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \ -files $jars \ -libjars $jars \ -input /in/avro \ -output /out/avro \ -mapper avro-mapper.py \ -reducer avro-reducer.py \ -file avro-mapper.py \ -file avro-reducer.py \ -inputformat org.apache.avro.mapred.AvroAsTextInputFormat Thanks for all the help! On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey wrot= e: > Hadoop has an old version of Avro in it. =A0You must place the 1.6.0 jar > (and relevant dependencies, or the avro-tools.jar with all dependencies > bundled) in a location that gets picked up first in the task classpath. > > Packaging it in the job jar works. I'm not sure if putting it in the > distributed cache and loading it as a library that way would. > > On 6/15/11 9:30 AM, "Matt Pouttu-Clarke" > wrote: > >>You have to package it in the job jar file under a /lib directory. >> >> >>On 6/15/11 9:26 AM, "Miki Tebeka" wrote: >> >>> Still didn't work. >>> >>> I'm pretty new to hadoop world, I probably need to place the avro jar >>> somewhere on the classpath of the nodes, >>> however I have no idea how to do that. >>> >>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J wrote: >>>> Miki, >>>> >>>> You'll need to provide the entire canonical class name >>>> (org.apache.avro.mapredS). >>>> >>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka >>>>wrote: >>>>> Greetings, >>>>> >>>>> I've tried to run a job with the following command: >>>>> >>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>>>> =A0 =A0-input /in/avro \ >>>>> =A0 =A0-output $out \ >>>>> =A0 =A0-mapper avro-mapper.py \ >>>>> =A0 =A0-reducer avro-reducer.py \ >>>>> =A0 =A0-file avro-mapper.py \ >>>>> =A0 =A0-file avro-reducer.py \ >>>>> =A0 =A0-cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>>>> =A0 =A0-inputformat AvroAsTextInputFormat >>>>> >>>>> However I get >>>>> -inputformat : class not found : AvroAsTextInputFormat >>>>> >>>>> I'm probably missing something obvious to do. >>>>> >>>>> Any ideas? >>>>> >>>>> Thanks! >>>>> -- >>>>> Miki >>>>> >>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting >>>>>wrote: >>>>>> Miki, >>>>>> >>>>>> Have you looked at AvroAsTextInputFormat? >>>>>> >>>>>> >>>>>>http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/A= v >>>>>>roAsT >>>>>> extInputFormat.html >>>>>> >>>>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>>>> >>>>>> https://issues.apache.org/jira/browse/AVRO-830 >>>>>> >>>>>> Are these perhaps what you're looking for? >>>>>> >>>>>> Doug >>>>>> >>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> I'd like to use hadoop streaming with Avro files. >>>>>>> My plan is to write an inputformat class that emits json records, >>>>>>>one >>>>>>> per line. This way the streaming application can read one record pe= r >>>>>>> line. >>>>>>> >>>>>>>(http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specify= i >>>>>>>ng+Ot >>>>>>> her+Plugins+for+Jobs) >>>>>>> >>>>>>> I couldn't find any documentation/help about writing inputformat >>>>>>> classes. Can someone point me to the right direction? >>>>>>> >>>>>>> Thanks, >>>>>>> -- >>>>>>> Miki >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >> >> >>iCrossing Privileged and Confidential Information >>This email message is for the sole use of the intended recipient(s) and >>may contain confidential and privileged information of iCrossing. Any >>unauthorized review, use, disclosure or distribution is prohibited. If >>you are not the intended recipient, please contact the sender by reply >>email and destroy all copies of the original message. >> >> > >