hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: shifting sequenceFileOutput format to Avro format
Date Fri, 31 Jan 2014 02:57:27 GMT
Thanks Yong.

Thanks & Regards,
B Anil Kumar.


On Fri, Jan 31, 2014 at 12:44 AM, java8964 <java8964@hotmail.com> wrote:

> In avro, you need to think about a schema to match your data. Avor's
> schema is very flexible and should be able to store all kinds of data.
>
> If you have a Json string, you have 2 options to generate the Avro schema
> for it:
>
> 1) Use "type: string" to store the whole Json string into Avro. This will
> be easiest, but you have to parse the data later when you use it.
> 2) Use Avro schema to match your json data, using matching structure from
> avro for your data, like 'record, array, map' etc.
>
> Yong
>
> ------------------------------
> Date: Fri, 31 Jan 2014 00:13:59 +0530
> Subject: shifting sequenceFileOutput format to Avro format
> From: akumarb2010@gmail.com
> To: user@hadoop.apache.org
>
>
> Hi,
>
> As of now in my jobs, I am using SequenceFileOutputFormat and I am
> emitting custom java objects as MR output.
>
> Now I am planning to emit it in avro format, I went through  few blogs but
> still have following doubts.
>
> 1) My current custom Writable objects has nested json format as
> toString(), So when I shift to avro format, should I just emit json string
> in avro format, instead of writable custom object?
>
> 2) If so, how can I create schema? My json string is nested and will have
> random key/value pairs.
>
> 3) Or can I still emit as custom objects?
>
>
>
> Thanks & Regards,
> B Anil Kumar.
>

Mime
View raw message