avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Confusion re. persisting the schema
Date Tue, 12 Oct 2010 06:04:45 GMT
You are simply writing encoded data with that code. You need to use
o.a.a.file.DataFileWriter to write proper avro datafiles (by appending your
datum to it), which stores schema in its headers among other features.

On Oct 12, 2010 11:29 AM, "Christopher Hunt" <huntc@internode.on.net> wrote:

Hi there,

I've just noticed that when I write out my binary data I don't appear to
have a schema saved with it. I was under the impression that Avro saves
schemas along with the data. Thanks for any clarification.

Here's my schema:

  "name": "FileDependency",
  "type": "record",
  "fields": [
      {"name": "file", "type": "string"},
      {"name": "imports", "type": {
          "type": "array", "items": "string"}

The code to write out my data is as follows (also appreciate any refinement
suggestions as I'm new to Avro):

  InputStream fileDependencySchemaIs = this.getClass()
  Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);

  GenericDatumWriter<GenericRecord> genericDatumWriter =
      new GenericDatumWriter<GenericRecord>(fileDependencySchema);
  OutputStream os = new FileOutputStream(new File(workFolder,
  Encoder encoder = new BinaryEncoder(os);
  for (Map.Entry<String, Set<String>> entry : fileDependencies
      .entrySet()) {

    GenericRecord genericRecord = new GenericData.Record(

    genericRecord.put("file", new Utf8(entry.getKey()));

    Set<String> imports = entry.getValue();
    GenericArray<Utf8> genericArray = new GenericData.Array<Utf8>(
    for (String importFile : imports) {
      genericArray.add(new Utf8(importFile));
    genericRecord.put("imports", genericArray);

    genericDatumWriter.write(genericRecord, encoder);

Thanks again.

Kind regards,

View raw message