hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: shifting sequenceFileOutput format to Avro format
Date Tue, 04 Feb 2014 19:07:26 GMT
Hi, Kumar:
I will suggest you can seek help of Avro in the Avro mailing list in the future, which can
be registered here:
http://avro.apache.org/mailing_lists.html
About your schema, you missed one "}"  in your file.
yzhang$ more test.avsc{"namespace": "test.avro", "type": "record", "name": "Session", "fields":
[   {"name":"VisitCommon", "type": {"type": "map", "values":"string"}},   {"name":"events",
"type": {        "type": "array",        "items":{        "name":"Event",        "type":"map",
       "values":"string"}        }    } ]}
yzhang$ java -jar ~/lib/avro-tools-1.7.6.jar compile schema test.avsc output/Input files to
compile:  test.avsc
yzhang$ ls -ls output/test/avro/Session.java16 -rw-r--r--  1 yzhang  staff  7371 Feb  4 14:05
output/test/avro/Session.java
Date: Tue, 4 Feb 2014 22:22:53 +0530
Subject: Re: shifting sequenceFileOutput format to Avro format
From: akumarb2010@gmail.com
To: user@hadoop.apache.org

I tried with different versions of avro-maven-plugin, with 1.7.5, 1.7.6 and with jdk1.7.0_45
version.
I am unable to resolve it.
Error message is as below:

[ERROR] symbol:   method deepCopy(org.apache.avro.Schema,java.util.Map<java.lang.CharSequence,java.lang.CharSequence>)[ERROR]
location: class org.apache.avro.generic.GenericData
Thanks & Regards,
B Anil Kumar.



On Tue, Feb 4, 2014 at 12:06 AM, AnilKumar B <akumarb2010@gmail.com> wrote:

Can anyone please suggest on how to resolve this issue?Thanks & Regards,
B Anil Kumar.



On Mon, Feb 3, 2014 at 9:34 AM, AnilKumar B <akumarb2010@gmail.com> wrote:


Hi Yong,
I followed your 2nd  suggestion. My data format is is nested(list of map), So I created .avsc
as below.
{"namespace": "test.avro",


 "type": "record", "name": "Session", "fields": [   {"name":"VisitCommon", "type": {     
     "type": "map", "values":"string"},


   {"name":"events",     "type": {    	"type": "array",    	"items":{


    	"name":"Event",    	"type":"map",    	"values":"string"}


    	}    } ]}
And I tried creating corresponding classes by using avro tool and with plugin, but there are
few errors on generated java code. What could be the issue?



1) Error: The method deepCopy(Schema, List<Map<CharSequence,CharSequence>>) is
undefined for the type GenericData2) And also observed there is some deprecated code. @Deprecated
public java.util.Map<java.lang.CharSequence,java.lang.CharSequence> VisitCommon;




I used eclipse plugin as mentioned below.http://avro.apache.org/docs/1.7.6/mr.html






Thanks & Regards,
B Anil Kumar.



On Fri, Jan 31, 2014 at 8:27 AM, AnilKumar B <akumarb2010@gmail.com> wrote:


Thanks Yong.Thanks & Regards,
B Anil Kumar.



On Fri, Jan 31, 2014 at 12:44 AM, java8964 <java8964@hotmail.com> wrote:





In avro, you need to think about a schema to match your data. Avor's schema is very flexible
and should be able to store all kinds of data.
If you have a Json string, you have 2 options to generate the Avro schema for it:




1) Use "type: string" to store the whole Json string into Avro. This will be easiest, but
you have to parse the data later when you use it.2) Use Avro schema to match your json data,
using matching structure from avro for your data, like 'record, array, map' etc.




Yong

Date: Fri, 31 Jan 2014 00:13:59 +0530
Subject: shifting sequenceFileOutput format to Avro format
From: akumarb2010@gmail.com




To: user@hadoop.apache.org

Hi,
As of now in my jobs, I am using SequenceFileOutputFormat and I am emitting custom java objects
as MR output.




Now I am planning to emit it in avro format, I went through  few blogs but still have following
doubts.

1) My current custom Writable objects has nested json format as toString(), So when I shift
to avro format, should I just emit json string in avro format, instead of writable custom
object? 





2) If so, how can I create schema? My json string is nested and will have random key/value
pairs.
3) Or can I still emit as custom objects? 






Thanks & Regards,
B Anil Kumar.

 		 	   		  







 		 	   		  
Mime
View raw message