avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sarvagya Pant <sarvagya.p...@gmail.com>
Subject Re: setting default values in avro
Date Fri, 08 Jul 2016 03:51:58 GMT
Hi Stanislav,

Thanks for the reply. What I want to achieve is that data arriving in Avro
writer may not contain all field as specified in the example above. I would
like to save default value if possible or retrieve the default value when
using DataFileReader. Is this possible? Should the data always contain all
the keys specified in the schema. I tried using ["int", "null"], "default"
: 0, but this was able to save the data if any field is not present, but
using DataFileReader I got None instead of default value 0. Any help will
be much appreciated. Thanks.

On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <s.savulchik@gmail.com>
wrote:

> Hi,
>
> I believe default values only work for readers, not writers.
>
> Spec says that (http://avro.apache.org/docs/current/spec.html):
> > default: A default value for this field, used when reading instances
> that lack this field (optional).
>
> On 7 июля 2016 г., at 21:16, Sarvagya Pant <sarvagya.pant@gmail.com>
> wrote:
>
> I am trying to implement Avro to replace some codes that tries to write
> data in CSV. This is because CSV cannot store the type of the field and all
> data are treated as string when trying to consume. I have copied the code
> for Avro from its website and would like to set a default value if there is
> no field.
>
> My avro file looks like this:
>
> {
>     "type" : "record",
>     "name" : "data",
>     "namespace" : "my.example",
>     "fields" : [
>         {"name" : "domain", "type" : "string", "default" : "EMPTY"},
>         {"name" : "ip", "type" : "string", "default" : "EMPTY"},
>         {"name" : "port", "type" : "int", "default" : 0},
>         {"name" : "score", "type" : "int", "default" : 0}
>     ]
> }
>
> I have written a simple python file that is expected to work. It is given
> below:
>
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
>
> schema = avro.schema.parse(open("data.avsc", "rb").read())
>
> writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
> writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
> writer.append({"ip": "1.2.3.4", "port" : 80})
> writer.append({"domain": "another domain", "score" : 100})
> writer.close()
>
> reader = DataFileReader(open("users.avro", "rb"), DatumReader())
> for data in reader:
>     print data
> reader.close()
>
> However, if I try to run this program, I get error that data are not
> mapped according to schema.
>
>     Traceback (most recent call last):
>   File "D:\arko.py", line 8, in <module>
>     writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
>   File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
>   File "build\bdist.win32\egg\avro\io.py", line 769, in write
>
> avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score':
> 20, 'port': 8080} is not an example of the schema {
>   "namespace": "my.example",
>   "type": "record",
>   "name": "userInfo",
>   "fields": [
>     {
>       "default": "EMPTY",
>       "type": "string",
>       "name": "domain"
>     },
>     {
>       "default": "EMPTY",
>       "type": "string",
>       "name": "ip"
>     },
>     {
>       "default": 0,
>       "type": "int",
>       "name": "port"
>     },
>     {
>       "default": 0,
>       "type": "int",
>       "name": "score"
>     }
>   ]
> }
> [Finished in 0.1s with exit code 1]
>
> I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.
>
> --
>
> *Sarvagya Pant*
> *Kathmandu, Nepal*
>
>
>


-- 

*Sarvagya Pant*
*Kathmandu, Nepal*

Mime
View raw message