avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Avro and Hadoop streaming
Date Wed, 15 Jun 2011 16:53:46 GMT
Hadoop has an old version of Avro in it.  You must place the 1.6.0 jar
(and relevant dependencies, or the avro-tools.jar with all dependencies
bundled) in a location that gets picked up first in the task classpath.

Packaging it in the job jar works. I'm not sure if putting it in the
distributed cache and loading it as a library that way would.

On 6/15/11 9:30 AM, "Matt Pouttu-Clarke"
<Matt.Pouttu-Clarke@icrossing.com> wrote:

>You have to package it in the job jar file under a /lib directory.
>
>
>On 6/15/11 9:26 AM, "Miki Tebeka" <miki.tebeka@gmail.com> wrote:
>
>> Still didn't work.
>> 
>> I'm pretty new to hadoop world, I probably need to place the avro jar
>> somewhere on the classpath of the nodes,
>> however I have no idea how to do that.
>> 
>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <harsh@cloudera.com> wrote:
>>> Miki,
>>> 
>>> You'll need to provide the entire canonical class name
>>> (org.apache.avro.mapredS).
>>> 
>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <miki.tebeka@gmail.com>
>>>wrote:
>>>> Greetings,
>>>> 
>>>> I've tried to run a job with the following command:
>>>> 
>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>>>>    -input /in/avro \
>>>>    -output $out \
>>>>    -mapper avro-mapper.py \
>>>>    -reducer avro-reducer.py \
>>>>    -file avro-mapper.py \
>>>>    -file avro-reducer.py \
>>>>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>>>>    -inputformat AvroAsTextInputFormat
>>>> 
>>>> However I get
>>>> -inputformat : class not found : AvroAsTextInputFormat
>>>> 
>>>> I'm probably missing something obvious to do.
>>>> 
>>>> Any ideas?
>>>> 
>>>> Thanks!
>>>> --
>>>> Miki
>>>> 
>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <cutting@apache.org>
>>>>wrote:
>>>>> Miki,
>>>>> 
>>>>> Have you looked at AvroAsTextInputFormat?
>>>>> 
>>>>> 
>>>>>http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av
>>>>>roAsT
>>>>> extInputFormat.html
>>>>> 
>>>>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/AVRO-830
>>>>> 
>>>>> Are these perhaps what you're looking for?
>>>>> 
>>>>> Doug
>>>>> 
>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>>>>> Greetings,
>>>>>> 
>>>>>> I'd like to use hadoop streaming with Avro files.
>>>>>> My plan is to write an inputformat class that emits json records,
>>>>>>one
>>>>>> per line. This way the streaming application can read one record
per
>>>>>> line.
>>>>>> 
>>>>>>(http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi
>>>>>>ng+Ot
>>>>>> her+Plugins+for+Jobs)
>>>>>> 
>>>>>> I couldn't find any documentation/help about writing inputformat
>>>>>> classes. Can someone point me to the right direction?
>>>>>> 
>>>>>> Thanks,
>>>>>> --
>>>>>> Miki
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>>> 
>
>
>iCrossing Privileged and Confidential Information
>This email message is for the sole use of the intended recipient(s) and
>may contain confidential and privileged information of iCrossing. Any
>unauthorized review, use, disclosure or distribution is prohibited. If
>you are not the intended recipient, please contact the sender by reply
>email and destroy all copies of the original message.
>
>


Mime
View raw message