avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eelco Hillenius <eelco.hillen...@gmail.com>
Subject mixing types using reflection
Date Sun, 16 Aug 2009 01:45:42 GMT
Hi all,

I'd like to adopt Avro to do audit logging. For this, I have a
hierarchy of audit events, like:

  |-- UserEvent
        |-- UserSessionStartedEvent
        |-- UserSessionEndedEvent
        |-- WorkspaceEvent
               |-- WorkspaceAccessedEvent

etc. And I would like to write instances of these events to log files
(and then later move them to HDFS so that we can fire MR jobs at

There are two show stoppers for me right now, AVRO-93 and AVRO-95.
Now, my question is about the latter one, which is about mixing
multiple types in one data file using reflection. I submitted a unit
test for it that shows the bug, but I'm wondering if the way I'm using
the API is as it is intended.

Basically, I assume I can reuse the OutputStream and
ReflectDatumWriter instances for different types:

    FileOutputStream fos = new FileOutputStream(FILE);
    ReflectDatumWriter dout = new ReflectDatumWriter();

Than, for every type I have:

    Schema fooSchema = ReflectData.getSchema(FooEvent.class);
    DataFileWriter<Object> fooWriter = new DataFileWriter<Object>(fooSchema,
        fos, dout);

    Schema barSchema = ReflectData.getSchema(BarEvent.class);
    DataFileWriter<Object> barWriter = new DataFileWriter<Object>(barSchema,
        fos, dout);

I don't know what events I will have in one file upfront, and I would
like to only write the schemas I'm actually using. So I'm assuming
(hoping) I can get a schema on the fly and create a new writer for it,
and then cache that writer for as long as the file is open and reuse
it for the same type. So I'd end up with a writer for every type
that's in the file so far.

Appending to the files:

barWriter.append(new BarEvent("Cheers mate!"));
fooWriter.append(new FooEvent(30));

Then when I'm about to rollover to a new file, I flush the writers and
close the output stream.

I'd then hope to be able to read records in again like this:

    GenericDatumReader<Object> din = new GenericDatumReader<Object>();
    SeekableFileInput sin = new SeekableFileInput(FILE);
    DataFileReader reader = new DataFileReader<Object>(sin, din);

and with ever new reader.next get a proper object magically
instantiated and populated back of course! :-)

Would using Avro like this be about right and is it a matter of having
these bugs fixed, or am I making some wrong assumptions and/ or should
I go about this differently?

See the attachment with https://issues.apache.org/jira/browse/AVRO-93
for the full unit test.



View raw message