avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Pouttu-Clarke <Matt.Pouttu-Cla...@icrossing.com>
Subject Re: Avro and Hadoop streaming
Date Wed, 15 Jun 2011 16:30:15 GMT
You have to package it in the job jar file under a /lib directory.


On 6/15/11 9:26 AM, "Miki Tebeka" <miki.tebeka@gmail.com> wrote:

> Still didn't work.
> 
> I'm pretty new to hadoop world, I probably need to place the avro jar
> somewhere on the classpath of the nodes,
> however I have no idea how to do that.
> 
> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <harsh@cloudera.com> wrote:
>> Miki,
>> 
>> You'll need to provide the entire canonical class name
>> (org.apache.avro.mapredS).
>> 
>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <miki.tebeka@gmail.com> wrote:
>>> Greetings,
>>> 
>>> I've tried to run a job with the following command:
>>> 
>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>>>    -input /in/avro \
>>>    -output $out \
>>>    -mapper avro-mapper.py \
>>>    -reducer avro-reducer.py \
>>>    -file avro-mapper.py \
>>>    -file avro-reducer.py \
>>>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>>>    -inputformat AvroAsTextInputFormat
>>> 
>>> However I get
>>> -inputformat : class not found : AvroAsTextInputFormat
>>> 
>>> I'm probably missing something obvious to do.
>>> 
>>> Any ideas?
>>> 
>>> Thanks!
>>> --
>>> Miki
>>> 
>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <cutting@apache.org> wrote:
>>>> Miki,
>>>> 
>>>> Have you looked at AvroAsTextInputFormat?
>>>> 
>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsT
>>>> extInputFormat.html
>>>> 
>>>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>>> 
>>>> https://issues.apache.org/jira/browse/AVRO-830
>>>> 
>>>> Are these perhaps what you're looking for?
>>>> 
>>>> Doug
>>>> 
>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>>>> Greetings,
>>>>> 
>>>>> I'd like to use hadoop streaming with Avro files.
>>>>> My plan is to write an inputformat class that emits json records, one
>>>>> per line. This way the streaming application can read one record per
>>>>> line.
>>>>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Ot
>>>>> her+Plugins+for+Jobs)
>>>>> 
>>>>> I couldn't find any documentation/help about writing inputformat
>>>>> classes. Can someone point me to the right direction?
>>>>> 
>>>>> Thanks,
>>>>> --
>>>>> Miki
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Harsh J
>> 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential
and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by reply email
and destroy all copies of the original message.



Mime
View raw message