avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arne Vogel <avo...@benocs.com>
Subject Re: setting default values in avro
Date Fri, 08 Jul 2016 13:17:48 GMT
Dear Yibing Shi,

a default value for a union must have the schema which is the first 
union member. Therefore, to set e.g. an int default value, use ["int", 
"null"] instead of ["null", "int"].

For more details, see the spec:
http://avro.apache.org/docs/1.8.1/spec.html#schema_complex

Regards,
Arne Vogel

On 08.07.2016 14:51, Yibing Shi wrote:
> + Sean Busbey
>
> My understanding is this problem is a limitation of Python AVRO 
> library. Currently it seems that the only valid default value is 
> "null". Please try below schema to see whether it works for you.
>
> {
> /    "type" : "record",/
> /    "name" : "data",/
> /    "namespace" : "my.example",/
> /    "fields" : [/
> /        {"name" : "domain", "type" : ["null", "string"], "default" : 
> null},/
> /        {"name" : "ip", "type" : ["null", "string"], "default" : null},/
> /        {"name" : "port", "type" : ["null", "int"], "default" : null},/
> /        {"name" : "score", "type" : ["null", "int"], "default" : null}/
> /    ]/
> /}/
>
> Below JIRAs seems to be related:
>
> https://issues.apache.org/jira/browse/AVRO-1265
> https://issues.apache.org/jira/browse/AVRO-1566
>
> I am pretty sure that the AVRO Java library supports using a non-null 
> default value for record fields. You can try it in a Java program.
>
>
> /*Yibing Shi*/
> /*Customer Operations Engineer*/
> <http://www.cloudera.com>
>
> On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik 
> <s.savulchik@gmail.com <mailto:s.savulchik@gmail.com>> wrote:
>
>     I'm not familiar with Avro good enough to propose an "Avro
>     solution" for your problem :(
>
>     If you want to serialize default values into Avro for some fields
>     you should provide the default values in code explicitly when
>     writing to Avro. Another approach is to declare the fields as
>     nullable using union types (e.g. [null, int]) and use default
>     values in code explicitly when reading from Avro.
>
>     I believe the "default" key you used in Avro schema is meant for
>     schema evolution
>     http://avro.apache.org/docs/current/spec.html#Schema+Resolution
>
>       * if the reader's record schema has a field that contains a
>         default value, and writer's schema does not have a field with
>         the same name, then the reader should use the default value
>         from its field.
>
>
>     пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <sarvagya.pant@gmail.com
>     <mailto:sarvagya.pant@gmail.com>>:
>
>         Hi Stanislav,
>
>         Thanks for the reply. What I want to achieve is that data
>         arriving in Avro writer may not contain all field as specified
>         in the example above. I would like to save default value if
>         possible or retrieve the default value when using
>         DataFileReader. Is this possible? Should the data always
>         contain all the keys specified in the schema. I tried using
>         ["int", "null"], "default" : 0, but this was able to save the
>         data if any field is not present, but using DataFileReader I
>         got None instead of default value 0. Any help will be much
>         appreciated. Thanks.
>
>         On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik
>         <s.savulchik@gmail.com <mailto:s.savulchik@gmail.com>> wrote:
>
>             Hi,
>
>             I believe default values only work for readers, not writers.
>
>             Spec says that
>             (http://avro.apache.org/docs/current/spec.html):
>             > default: A default value for this field, used when
>             reading instances that lack this field (optional).
>
>>             On 7 июля 2016 г., at 21:16, Sarvagya Pant
>>             <sarvagya.pant@gmail.com
>>             <mailto:sarvagya.pant@gmail.com>> wrote:
>>
>>             I am trying to implement Avro to replace some codes that
>>             tries to write data in CSV. This is because CSV cannot
>>             store the type of the field and all data are treated as
>>             string when trying to consume. I have copied the code for
>>             Avro from its website and would like to set a default
>>             value if there is no field.
>>
>>             My avro file looks like this:
>>
>>             {
>>                 "type" : "record",
>>                 "name" : "data",
>>                 "namespace" : "my.example",
>>                 "fields" : [
>>                     {"name" : "domain", "type" : "string", "default"
>>             : "EMPTY"},
>>                     {"name" : "ip", "type" : "string", "default" :
>>             "EMPTY"},
>>                     {"name" : "port", "type" : "int", "default" : 0},
>>                     {"name" : "score", "type" : "int", "default" : 0}
>>                 ]
>>             }
>>
>>             I have written a simple python file that is expected to
>>             work. It is given below:
>>
>>             import avro.schema
>>             from avro.datafile import DataFileReader, DataFileWriter
>>             from avro.io <http://avro.io/> import DatumReader,
>>             DatumWriter
>>
>>             schema = avro.schema.parse(open("data.avsc", "rb").read())
>>
>>             writer = DataFileWriter(open("users.avro", "w"),
>>             DatumWriter(), schema)
>>             writer.append({"domain": "hello domain", "score" : 20,
>>             "port" : 8080})
>>             writer.append({"ip": "1.2.3.4", "port" : 80})
>>             writer.append({"domain": "another domain", "score" : 100})
>>             writer.close()
>>
>>             reader = DataFileReader(open("users.avro", "rb"),
>>             DatumReader())
>>             for data in reader:
>>                 print data
>>             reader.close()
>>
>>             However, if I try to run this program, I get error that
>>             data are not mapped according to schema.
>>
>>                 Traceback (most recent call last):
>>               File "D:\arko.py", line 8, in <module>
>>             writer.append({"domain": "hello domain", "score" : 20,
>>             "port" : 8080})
>>               File "build\bdist.win32\egg\avro\datafile.py", line
>>             196, in append
>>               File "build\bdist.win32\egg\avro\io.py", line 769, in write
>>
>>             avro.io.AvroTypeException: The datum {'domain': 'hello
>>             domain', 'score': 20, 'port': 8080} is not an example of
>>             the schema {
>>               "namespace": "my.example",
>>               "type": "record",
>>               "name": "userInfo",
>>               "fields": [
>>                 {
>>                   "default": "EMPTY",
>>                   "type": "string",
>>                   "name": "domain"
>>                 },
>>                 {
>>                   "default": "EMPTY",
>>                   "type": "string",
>>                   "name": "ip"
>>                 },
>>                 {
>>                   "default": 0,
>>                   "type": "int",
>>                   "name": "port"
>>                 },
>>                 {
>>                   "default": 0,
>>                   "type": "int",
>>                   "name": "score"
>>                 }
>>               ]
>>             }
>>             [Finished in 0.1s with exit code 1]
>>
>>             I am using avro v1.8.0 and python 2.7. What am I doing
>>             wrong here? Thanks.
>>
>>             -- 
>>             *Sarvagya Pant
>>             *
>>             *Kathmandu, Nepal*
>
>
>
>
>         -- 
>         *Sarvagya Pant
>         *
>         *Kathmandu, Nepal*
>
>

-- 
BENOCS GMBH
Arne Vogel
Winterfeldtstr. 21
10781 Berlin
Email: avogel@benocs.com
www.benocs.com

Board of Management: Michael Wolz, Dr.-Ing. Oliver Holschke, Dr.-Ing. Ingmar Poese
Commercial Register: Amtsgericht Bonn HRB 19378


Mime
View raw message