kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Avro serialization
Date Tue, 20 Aug 2013 18:37:41 GMT
Ok someone answered a similar question in the Avro forum.

It *sounds* like that the Avro messages sent to Kafka are wrapped and/or prepended with the
SHA which is used by the consumer to lookup the schema. That makes more sense. 

On Aug 20, 2013, at 11:09 AM, Mark <static.void.dev@gmail.com> wrote:

> Thanks Jay I've already read the paper and Jira ticket (haven't read the code) but I'm
still confused on how to integrate this with Kafka. 
> 
> Say we write an Avro message (the message contains a SHA of the shcmea) to Kafka and
a consumer pulls of this message. How does the consume know how to deserialize the message
to even be able to get to the SHA to look up the full schema. Would this require wrapping
all messages in another type of message like JSON { hash:  <16 bytes>, message: <Avro
encoded message in bytes> }
> 
> On Aug 20, 2013, at 9:33 AM, Jay Kreps <jay.kreps@gmail.com> wrote:
> 
>> This paper has more information on what we are doing at LinkedIn:
>> http://sites.computer.org/debull/A12june/pipeline.pdf
>> 
>> This Avro JIRA has a schema repository implementation similar to the one
>> LinkedIn uses:
>> https://issues.apache.org/jira/browse/AVRO-1124
>> 
>> -Jay
>> 
>> 
>> On Tue, Aug 20, 2013 at 7:08 AM, Mark <static.void.dev@gmail.com> wrote:
>> 
>>> Can someone break down how message serialization would work with Avro?
>>> I've read instead of adding a schema to every single event it would be wise
>>> to add some sort of fingerprint with each message to identify which schema
>>> it should used. What I'm having trouble understanding is, how do we read
>>> the fingerprint without a schema? Don't we need the schema to deserialize?
>>> Same question goes for working with Hadoop.. how does the input format
>>> know which schema to use?
>>> 
>>> Thanks
> 


Mime
View raw message