hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: shifting sequenceFileOutput format to Avro format
Date Mon, 03 Feb 2014 18:36:50 GMT
Can anyone please suggest on how to resolve this issue?

Thanks & Regards,
B Anil Kumar.


On Mon, Feb 3, 2014 at 9:34 AM, AnilKumar B <akumarb2010@gmail.com> wrote:

> Hi Yong,
>
> I followed your 2nd  suggestion. My data format is is nested(list of map),
> So I created .avsc as below.
>
> {"namespace": "test.avro",
>  "type": "record",
>  "name": "Session",
>  "fields": [
>    {"name":"VisitCommon", "type": {
>            "type": "map", "values":"string"},
>    {"name":"events",
>     "type": {
>     "type": "array",
>     "items":{
>     "name":"Event",
>     "type":"map",
>     "values":"string"}
>     }
>     }
>  ]
> }
>
> And I tried creating corresponding classes by using avro tool and with
> plugin, but there are few errors on generated java code. What could be the
> issue?
>
> 1) Error: The method deepCopy(Schema,
> List<Map<CharSequence,CharSequence>>) is undefined for the type GenericData
> 2) And also observed there is some deprecated code.
>  @Deprecated public
> java.util.Map<java.lang.CharSequence,java.lang.CharSequence> VisitCommon;
>
> I used eclipse plugin as mentioned below.
> http://avro.apache.org/docs/1.7.6/mr.html
>
>
>
>
> Thanks & Regards,
> B Anil Kumar.
>
>
> On Fri, Jan 31, 2014 at 8:27 AM, AnilKumar B <akumarb2010@gmail.com>wrote:
>
>> Thanks Yong.
>>
>> Thanks & Regards,
>> B Anil Kumar.
>>
>>
>> On Fri, Jan 31, 2014 at 12:44 AM, java8964 <java8964@hotmail.com> wrote:
>>
>>> In avro, you need to think about a schema to match your data. Avor's
>>> schema is very flexible and should be able to store all kinds of data.
>>>
>>> If you have a Json string, you have 2 options to generate the Avro
>>> schema for it:
>>>
>>> 1) Use "type: string" to store the whole Json string into Avro. This
>>> will be easiest, but you have to parse the data later when you use it.
>>> 2) Use Avro schema to match your json data, using matching structure
>>> from avro for your data, like 'record, array, map' etc.
>>>
>>> Yong
>>>
>>> ------------------------------
>>> Date: Fri, 31 Jan 2014 00:13:59 +0530
>>> Subject: shifting sequenceFileOutput format to Avro format
>>> From: akumarb2010@gmail.com
>>> To: user@hadoop.apache.org
>>>
>>>
>>> Hi,
>>>
>>> As of now in my jobs, I am using SequenceFileOutputFormat and I am
>>> emitting custom java objects as MR output.
>>>
>>> Now I am planning to emit it in avro format, I went through  few blogs
>>> but still have following doubts.
>>>
>>> 1) My current custom Writable objects has nested json format as
>>> toString(), So when I shift to avro format, should I just emit json string
>>> in avro format, instead of writable custom object?
>>>
>>> 2) If so, how can I create schema? My json string is nested and will
>>> have random key/value pairs.
>>>
>>> 3) Or can I still emit as custom objects?
>>>
>>>
>>>
>>> Thanks & Regards,
>>> B Anil Kumar.
>>>
>>
>>
>

Mime
View raw message