hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Hadoop's Avro dependencies.
Date Wed, 22 Aug 2012 07:46:41 GMT
Bertrand,

For the inter-node part, we've been using Writables as well, but in
2.x onwards its switched to ProtocolBuffers. And you're right, this
shouldn't interfere user tasks if they chose to send in a different
protobuf version along.

On Wed, Aug 22, 2012 at 11:57 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
> Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.
>
> There is the one used by MapReduce for transmitting user data : Writable by
> default, indeed.
>
> But there is also what is used for inter-node communication. I can't
> remember this one though. But it implies that you could have another
> dependency having no direct impact on the user.
>
> Regards
>
> Bertrand
>
>
> On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Hi,
>>
>> By default, only the Writable serialization technique is used. If you
>> choose to use Avro in your job, only then Avro serialization is
>> utilized at the intermediate serialization step.
>>
>> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
>> <rahul.rec.dgp@gmail.com> wrote:
>> > Well , thanks a lot Harsh. I though avro was result of hadoop's
>> > serialization needs.
>> >
>> > If avro isn't used for serializing maps outputs and transfer it to other
>> > reducers then whats used for this , if not avro.
>> >
>> > Thanks,
>> > Rahul
>> >
>> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <harsh@cloudera.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> >> does provide an AvroSerialization class you can use optionally to
>> >> serialize using Avro libraries, and the 2.x distribution does ship an
>> >> Avro jar along with it.
>> >>
>> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> >> <rahul.rec.dgp@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I was going through the Apache Hadoop's distribution dependencies
>> >> > (jars
>> >> > in
>> >> > lib folder) and I could not find avro-1.x.x.jar.
>> >> >
>> >> > I though hadoop internally uses avro as its serialization mechanism
>> >> > for
>> >> > intermediate data transmission (transporting maps output to reducers
>> >> > etc
>> >> > ),
>> >> > so hadoop distribution must have avro within it. But it doesn't !
>> >> >
>> >> > Can someone enlighten me on this?
>> >> >
>> >> > Thanks,
>> >> > Rahul
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Mime
View raw message