avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Kawala <fkaw...@bestofmedia.com>
Subject Re: How to declare an optional field
Date Thu, 07 Jun 2012 08:48:57 GMT
Hello,

Firstly thanks for your help. I've corrected my schema according to your
advice, but I've still the same kind of issue :

------------------------------------------------------------------------

With this schema :

/(...) /
{"name": "in_reply_to", "type": ["null", "long" ], "default": null }, 
/(...) /
{"name":"urls","type":["null",{"type":"array","items": /(record)/ }]}
(...)


Using this schema, the following data :

{"created_at": "Mon, 28 May 2012 00:01:25 +0000", "emitter": 405427230, "emitter_name": "CallmeOceane_",
"geo": null, "hashtags": null,* "in_reply_to": 206897508021055489*, 
"lang": "fr", "msg": "@Chloe_OneD Aaaah puuuuutain j'ai toujours pas finis Wild Souls machin
truc", "uid": 206897932501385217, "urls": null, "usermentions": 
[{"id": 288136906, "indices": [0, 11], "name": "Happiness \u10e6", "screen_name": "Chloe_OneD"}]}|

Ends on this error :

2012-06-07 10:16:07,831 WARN org.apache.hadoop.streaming.PipeMapRed: org.apache.avro.AvroTypeException:
Expected start-union. Got VALUE_NUMBER_INT
	at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:460)
	at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:418)
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
	at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:102)
	at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:88)
	at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:446)
	at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421)


------------------------------------------------------------------------

While using this data :

{"created_at": "Mon, 28 May 2012 00:00:10 +0000", "emitter": 59809965, "emitter_name": "Droolius",
"geo": null, "hashtags": null, *"in_reply_to": null*, "lang": "en", "msg": 
"RT @davidchang: Thank you again Amy Rowat &amp; team UCLA @scienceandfood : Umami Reverse
Engineering + The Joy of MSG http://t.co/nk1QBGbg", "uid": 206897616326377472, 
*"urls": [{"display_url": "bit.ly/KvD0QZ", "expanded_url": "http://bit.ly/KvD0QZ", "indices":
[119, 139], "url": "http://t.co/nk1QBGbg"}]*, 
"usermentions": [{"id": 221185711, "indices": [3, 14], "name": "Dave Chang", "screen_name":
"davidchang"}, 
{"id": 526175293, "indices": [58, 73], "name": "UCLA Science & Food", "screen_name": "scienceandfood"}]}|


It ends with :

2012-06-07 10:38:19,530 WARN org.apache.hadoop.streaming.PipeMapRed: org.apache.avro.AvroTypeException:
Expected start-union. Got START_ARRAY
	at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:460)
	at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:418)
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
	at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:102)
	at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:88)
	at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:446)
	at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421)


------------------------------------------------------------------------

Accordingly to these error stacks I guess that my problem has something
to do with the custom output format which relies on
org.apache.avro.generic,
am I right (and consequently on the strict java implementation) ?

All the best, again thanks for reading :)

Regards,
François.




since the Avro writer in the one available in : , I
> According to the spec, the default value for a union is assumed to have 
> the type of the first element of the union.
>
> http://avro.apache.org/docs/current/spec.html#schema_record
>
> So some valid fields would be:
>
> {"name":"x", "type":["long", "null"], "default": 0}
> {"name":"y", "type":["null", "long"], "default": null}
>
> The following are invalid fields, since the type of the default value 
> does not match that of the first union element.
>
> {"name":"x", "type":["long", "null"], "default": null}
> {"name":"y", "type":["null", "long"], "default": 0}
>
> Python may not implement this strictly, but Java does.
>
> This is a common point of confusion.  We should probably document it 
> better.  I'm not sure whether it's causing the problem you're seeing, 
> but perhaps it is.
>
> Cheers,
>
> Doug
>
> On 06/06/2012 04:15 AM, François Kawala wrote:
> > Dear all,
> >
> > Despite my desperate effort to get a working schema I can not manage to
> > specify that a field of a given record can be either : "a given type" or
> > "null". I've tried with unions but the back-end that I have to use seems
> > to be unhappy with it. More precisely : I'm trying to output the result
> > of a Streaming MR job within an AVRO container. This job is written in
> > python an executed through dumbo (http://www.dumbotics.com), and a
> > custom OutputFormat is used
> > (https://github.com/tomslabs/avro-utils/tree/master/src/main/java/com/tomslabs/grid/avro)
> >
> >
> > However since this custom OutputFormat relies on org.apache.avro
> > sources, I've thought this list could be a good spot to call for help.
> >
> > Thanks for reading,
> > François.
> >
> > ------------------------------------------------------------------------
> >
> > Here is some complementary elements :
> >
> > Fragment of the schema that I think to be responsible of my troubles :
> >
> > {"name": "in_reply_to", "type": [{"type": "long"},"null"], "default":"null"}
> >
> > I've also unsuccessfully tried :
> >
> > {"name": "in_reply_to", "type": [{"type": "long"},"null"]}
> > {"name": "in_reply_to", "type": ["null",{"type": "long"}]}
> >
> >     Each ending with the same error message :
> >
> >         org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
> >
> >     Error Stack :
> >
> >     	at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:460)
> >     	at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:418)
> >     	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> >     	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >     	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> >     	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> >     	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> >     	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> >     	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> >     	at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:102)
> >     	at com.tomslabs.grid.avro.TextTypedBytesToAvroOutputFormat$AvroRecordWriter.write(TextTypedBytesToAvroOutputFormat.java:88)
> >     	at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:446)
> >     	at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421)
> >
> >
> > 	
> >
> >
> >
> >
>

Mime
View raw message