hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: shifting sequenceFileOutput format to Avro format
Date Thu, 30 Jan 2014 19:14:56 GMT
In avro, you need to think about a schema to match your data. Avor's schema is very flexible
and should be able to store all kinds of data.
If you have a Json string, you have 2 options to generate the Avro schema for it:
1) Use "type: string" to store the whole Json string into Avro. This will be easiest, but
you have to parse the data later when you use it.2) Use Avro schema to match your json data,
using matching structure from avro for your data, like 'record, array, map' etc.

Date: Fri, 31 Jan 2014 00:13:59 +0530
Subject: shifting sequenceFileOutput format to Avro format
From: akumarb2010@gmail.com
To: user@hadoop.apache.org

As of now in my jobs, I am using SequenceFileOutputFormat and I am emitting custom java objects
as MR output.
Now I am planning to emit it in avro format, I went through  few blogs but still have following

1) My current custom Writable objects has nested json format as toString(), So when I shift
to avro format, should I just emit json string in avro format, instead of writable custom

2) If so, how can I create schema? My json string is nested and will have random key/value
3) Or can I still emit as custom objects? 

Thanks & Regards,
B Anil Kumar.

View raw message