avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Schema not getting saved along with Data
Date Wed, 26 Mar 2014 09:55:46 GMT
Hi Sachneet,


On Wed, Mar 26, 2014 at 8:37 AM, Sachneet Singh Bains <
sachneets.bains@impetus.co.in> wrote:

>  Hi Sean,
>
>
>
> My use case is to store incoming data(various sources) into a database
> like Cassandra. The data will be serialized using AVRO.
>

It would be foolish for me NOT to put in a plug here for Apache Gora [0].
Gora is an acronym for Generic Object Representation using Avro. So it will
do possibly exactly what you are trying to do out of the box. Cassandra is
just one of the NoSQL databases we support in Gora. You can see more by
reading the site documentation.

[0] http://gora.apache.org


> My questions are:
>
> 1.       What is the best way to do this ?
>
Right now in gora-cassandra module we support following Avro data types:
Type.STRING, Type.BOOLEAN, Type.BYTES, Type.DOUBLE, Type.FLOAT, Type.INT,
Type.LONG, Type.FIXED, Type.ARRAY, Type.MAP, Type.UNION, Type.RECORD. For a
more comprehensive overview of how we actually store the data you can head
over to dev@gora posting your question and we will reply in full.


> 2.       How should I keep the schema information along with each record
> ? For e.g. two columns , one storing data and another schema/fingerprints ?
>
Well this is certainly an option, right now though it appear that we store
(prepend) the Schema with the data as it is. Right now the storage logic is
that we are focused on the data and not the data schema/fingerprints.
Therefore when executing Gora Queries in Cassandra we query the Cassandra
keyspace by families. When we add sub/supercolumns, Gora keys are mapped to
Cassandra partition keys only. This is because we follow the Cassandra
logic where column family data is partitioned across nodes based on row
Key. You would therefore need to change some aspect of the data modeling if
you really wished to store data metadata such as Schema & fingerprints
separately.


> 3.       I see fingerprints as one option but how to make use of it ;
> where to maintain the schema repository and how to add fingerprints to data
>
I've never used fingerprints so i cannot comment. Sorry!


> 4.        Also, I am wondering if there is ant feature to automatically
> generate a schema from an incoming data (CSV format) ?
>

Everything for Java is Mavenized. There will be no ant target. You could
possibly write an implementation for avro-tools which would achieve this
for you. You can see current option in avro-tools by looking into the
Main#Main() method
https://svn.apache.org/repos/asf/avro/trunk/lang/java/tools/src/main/java/org/apache/avro/tool/Main.java

> 5.       Is there any recommended database to store data in AVRO format
> (relational or Nosql) ?
>
No there is no recommended DB. LOADS of use cases use many different DB's.
I would suggest you consider your data and how you will be querying it
before you choose your DB.

Hopefully some of the above give food for thought.
Lewis

Mime
View raw message