avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sarvagya Pant <sarvagya.p...@gmail.com>
Subject setting default values in avro
Date Thu, 07 Jul 2016 15:16:36 GMT
I am trying to implement Avro to replace some codes that tries to write
data in CSV. This is because CSV cannot store the type of the field and all
data are treated as string when trying to consume. I have copied the code
for Avro from its website and would like to set a default value if there is
no field.

My avro file looks like this:

    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}

I have written a simple python file that is expected to work. It is given

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data

However, if I try to run this program, I get error that data are not mapped
according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score':
20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
      "default": 0,
      "type": "int",
      "name": "port"
      "default": 0,
      "type": "int",
      "name": "score"
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.


*Sarvagya Pant*
*Kathmandu, Nepal*

View raw message