avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Hains <aha...@gmail.com>
Subject Re: GenericData.Record vs specific generated avro object
Date Thu, 20 Feb 2014 16:52:58 GMT
If the avro data from flume has the schema:
{"type":"record","name":"Event","fields":[{"name":"
headers","type":{"type":"map","values":"string"}},{"name":"
body","type":"bytes"}]}
then a record can only contain a headers map of strings, and a body field
with bytes. I don't see how it could contain structured data in the body
like you described:
{"headers": {"timestamp": "1392825607332", "parentnode":
"2014021909\/1392825638009"},
"body": {"bytes": "{"row":"000372d8","data":{"x1":"v1","x2":"v2","x3":"v3"},
"timestamp":1392380848474}"}}

Typically your flume event contains your data payload in that body field as
a blob. So if you have a flume hdfs sink that is logging the raw flume
event with a config of serializer=avro_event then you would need to unpack
the data in the body field manually in your mapreduce. If you instead want
the hdfs sink to write your payload in your custom avro format then I think
you would need to configure the sink with the appropriate serializer (e.g.
https://github.com/cloudera/cdk/blob/master/cdk-flume-avro-event-serializer/src/main/java/org/apache/flume/serialization/AvroEventSerializer.java
)

Apologies if I'm misunderstanding your problem and what you're trying to
accomplish.
-a



On Wed, Feb 19, 2014 at 9:52 PM, AnilKumar B <akumarb2010@gmail.com> wrote:

> Hi,
>
> I am trying to process avro data using mapreduce. The data which I get in
> avro format is generated by flume in below format.
>
>
> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}
>
>
> And data sample is as below:
>
> {"headers": {"timestamp": "1392825607332", "parentnode": "2014021909\/1392825638009"},
> "body": {"bytes":
> "{"row":"000372d8","data":{"x1":"v1","x2":"v2","x3":"v3"},"timestamp":1392380848474}"}}
>
> But when I want to use this data in Mapreduce, I am trying to read this
> data as AvroKey<GenericData.Record>, NullWritable in mapper. I am able to
> get the whole message when I see key.datum(), I am unable access the fields
> like "row",  "data", "timestamp".
>
>
> So how can I resolve this? Do I need to generate specific avro java class
> for below schema and should I use generated class for processing in
> Mapreduce or Should I use GenericData.Record itself?
>
>
> {
>
>   "namespace": "com.test.avro",
>
>   "type": "record",
>
>   "name": "Event",
>
>   "fields": [
>
>     {
>
>       "name": "row",
>
>       "type": "string"
>
>     },
>
>     {
>
>       "name": "data",
>
>       "type": {
>
>         "type": "map",
>
>         "values": "string"
>
>       }
>
>     },
>
>     {
>
>       "name": "timestamp",
>
>       "type": "string"
>
>     }
>
>   ]
>
> }
>
>
> Thanks & Regards,
> B Anil Kumar.
>

Mime
View raw message