hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: shifting sequenceFileOutput format to Avro format
Date Mon, 03 Feb 2014 04:04:38 GMT
Hi Yong,

I followed your 2nd  suggestion. My data format is is nested(list of map),
So I created .avsc as below.

{"namespace": "test.avro",
 "type": "record",
 "name": "Session",
 "fields": [
   {"name":"VisitCommon", "type": {
           "type": "map", "values":"string"},
   {"name":"events",
    "type": {
    "type": "array",
    "items":{
    "name":"Event",
    "type":"map",
    "values":"string"}
    }
    }
 ]
}

And I tried creating corresponding classes by using avro tool and with
plugin, but there are few errors on generated java code. What could be the
issue?

1) Error: The method deepCopy(Schema, List<Map<CharSequence,CharSequence>>)
is undefined for the type GenericData
2) And also observed there is some deprecated code.
 @Deprecated public
java.util.Map<java.lang.CharSequence,java.lang.CharSequence> VisitCommon;

I used eclipse plugin as mentioned below.
http://avro.apache.org/docs/1.7.6/mr.html




Thanks & Regards,
B Anil Kumar.


On Fri, Jan 31, 2014 at 8:27 AM, AnilKumar B <akumarb2010@gmail.com> wrote:

> Thanks Yong.
>
> Thanks & Regards,
> B Anil Kumar.
>
>
> On Fri, Jan 31, 2014 at 12:44 AM, java8964 <java8964@hotmail.com> wrote:
>
>> In avro, you need to think about a schema to match your data. Avor's
>> schema is very flexible and should be able to store all kinds of data.
>>
>> If you have a Json string, you have 2 options to generate the Avro schema
>> for it:
>>
>> 1) Use "type: string" to store the whole Json string into Avro. This will
>> be easiest, but you have to parse the data later when you use it.
>> 2) Use Avro schema to match your json data, using matching structure from
>> avro for your data, like 'record, array, map' etc.
>>
>> Yong
>>
>> ------------------------------
>> Date: Fri, 31 Jan 2014 00:13:59 +0530
>> Subject: shifting sequenceFileOutput format to Avro format
>> From: akumarb2010@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> Hi,
>>
>> As of now in my jobs, I am using SequenceFileOutputFormat and I am
>> emitting custom java objects as MR output.
>>
>> Now I am planning to emit it in avro format, I went through  few blogs
>> but still have following doubts.
>>
>> 1) My current custom Writable objects has nested json format as
>> toString(), So when I shift to avro format, should I just emit json string
>> in avro format, instead of writable custom object?
>>
>> 2) If so, how can I create schema? My json string is nested and will have
>> random key/value pairs.
>>
>> 3) Or can I still emit as custom objects?
>>
>>
>>
>> Thanks & Regards,
>> B Anil Kumar.
>>
>
>

Mime
View raw message