avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: Dynamic Schema
Date Wed, 02 Apr 2014 21:01:08 GMT
Hi Amit,

The Avro data file format requires the writer to know the schema from the start, because all
records in the file are then written with the same schema. So there probably isn't an alternative
to what you're doing -- to buffer as much as you can in memory, write it out to file when
the memory buffer is full, and then start a new file.

You can't change the schema of a data file once it has been written, but you can run a background
process which merges several data files together, and writes the result to a new file. You
can make the merged file's schema the union of all the input file schemas, or you can write
some application-specific code which combines the schemas into one, and evolve all the records
into that merged schema. This can be done by streaming through the files -- you don't need
to keep all the data in memory.

Martin



On 1 Apr 2014, at 21:55, amit nanda <amitwip@gmail.com> wrote:
> I have very dynamic data that i want to write to an avro file. The solution i have is
to store all that data in the memory and then calculate the schema, and then start the writing.

> 
> This causes the files to be smaller in size, because of the memory limitations.
> 
> What i am looking for is that i will start data as and when it is collected, but how
should i compute the schema in this case? Can i change the schema for an avro file?
> 
> Thanks
> Amit


Mime
View raw message