avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Hunt <hu...@internode.on.net>
Subject Confusion re. persisting the schema
Date Tue, 12 Oct 2010 05:58:57 GMT
Hi there,

I've just noticed that when I write out my binary data I don't appear to have a schema saved
with it. I was under the impression that Avro saves schemas along with the data. Thanks for
any clarification.

Here's my schema:

{
  "name": "FileDependency", 
  "type": "record",
  "fields": [
      {"name": "file", "type": "string"},
      {"name": "imports", "type": {
          "type": "array", "items": "string"}
      }
    ]
}

The code to write out my data is as follows (also appreciate any refinement suggestions as
I'm new to Avro):

  @Cleanup
  InputStream fileDependencySchemaIs = this.getClass()
      .getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME);
  Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);

  GenericDatumWriter<GenericRecord> genericDatumWriter = 
      new GenericDatumWriter<GenericRecord>(fileDependencySchema);
  @Cleanup
  OutputStream os = new FileOutputStream(new File(workFolder,
      FILE_DEPENDENCY_GRAPH_NAME));
  Encoder encoder = new BinaryEncoder(os);
  for (Map.Entry<String, Set<String>> entry : fileDependencies
      .entrySet()) {

    GenericRecord genericRecord = new GenericData.Record(
    fileDependencySchema);

    genericRecord.put("file", new Utf8(entry.getKey()));

    Set<String> imports = entry.getValue();
    GenericArray<Utf8> genericArray = new GenericData.Array<Utf8>(
        imports.size(), 
        Schema.createArray(Schema.create(Type.STRING)));
    for (String importFile : imports) {
      genericArray.add(new Utf8(importFile));
    }
    genericRecord.put("imports", genericArray);

    genericDatumWriter.write(genericRecord, encoder);
  }
  encoder.flush();

Thanks again.

Kind regards,
Christopher
Mime
View raw message