avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachneet Singh Bains <sachneets.ba...@impetus.co.in>
Subject RE: Schema not getting saved along with Data
Date Wed, 26 Mar 2014 08:37:33 GMT
Hi Sean,

My use case is to store incoming data(various sources) into a database like Cassandra. The
data will be serialized using AVRO.
My questions are:

1.       What is the best way to do this ?

2.       How should I keep the schema information along with each record ? For e.g. two columns
, one storing data and another schema/fingerprints ?

3.       I see fingerprints as one option but how to make use of it ; where to maintain the
schema repository and how to add fingerprints to data

4.        Also, I am wondering if there is ant feature to automatically generate a schema
from an incoming data (CSV format) ?

5.       Is there any recommended database to store data in AVRO format (relational or Nosql)

I know I have asked a lot of questions ☺ .I will highly appreciate your response and suggestions.


From: Sean Busbey [mailto:busbey+lists@cloudera.com]
Sent: Wednesday, March 26, 2014 11:35 AM
To: user@avro apache. org
Subject: Re: Schema not getting saved along with Data

Hi Sachneet!

Can you describe your use case a little?

Far and away the recommended way to use Avro is via one of the container files. The getting
started guide for Java will walk you through writing and reading via the default container


On Wed, Mar 26, 2014 at 12:55 AM, Sachneet Singh Bains <sachneets.bains@impetus.co.in<mailto:sachneets.bains@impetus.co.in>>
Thanks a lot Eric, this was useful.

I was going through ‘Schema Fingerprints’. Are there any methods available (JAVA) that
I can use to write these fingerprints along with data rather than the complete schema.
I am looking at something like Writer.write(fingerprint,recrod) .

What is the recommended way of using these fingerprints ?


From: Eric Wasserman [mailto:ewasserman@247-inc.com<mailto:ewasserman@247-inc.com>]
Sent: Tuesday, March 25, 2014 9:56 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Subject: RE: Schema not getting saved along with Data

Its a "must do".

The real requirement is the reader of the serialized records must have *exactly* the schema
that was used to write the records. [Note: The reader may also, optionally, specify an different
reader's schema that it would like the Avro parser to use to translate the deserialized records

How you arrange for the parser to get the writer's schema varies with your usage. If you happen
to use the org.apache.avro.file.DataFileWriter then it will prefix the file with the schema
used to write all the records. The corresponding DataFileReader will use the prefixed schema
to properly deserialize the records.

If you are putting serialized records into some other store, e.g. a database, and there is
a chance that the different records would be written with different schemas (or versions of
schemas), then you would want to include an indicator of the writer's schema (e.g. a hash
of the writer's schema or a foreign key to a schema's table) along with the record so that
at read time you could provide the correct writer's schema to your org.apache.avro.io.DatumReader.

From: Sachneet Singh Bains <sachneets.bains@impetus.co.in<mailto:sachneets.bains@impetus.co.in>>
Sent: Tuesday, March 25, 2014 7:18 AM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Subject: Schema not getting saved along with Data


I am new to AVRO and going through the documentation.
From http://avro.apache.org/docs/1.7.6/gettingstartedjava.html
“Data in Avro is always stored with its corresponding schema”

Does the above line convey a ‘explicitly must do’ or ‘implicitly done’ ?
Is it always true even when we write single records to any stream or applies only when  “Object
Container Files” are used ?
I tried writing some records to a file using DatumWriter and I see no schema saved along.
Please resolve my confusion.


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.
View raw message