avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-251) add schema for schemas
Date Thu, 18 Feb 2010 06:29:28 GMT

    [ https://issues.apache.org/jira/browse/AVRO-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835132#action_12835132

Scott Carey commented on AVRO-251:

I am using DataFileReader/Writer and the header is about 5K in size because the whole schema
is in text.

I'm not sure if the approach in this ticket is best for the file format, but some way to persist
a schema in a compact form would be useful.  A binary format would be smaller, but every field
and type would still have to be there in text.  Maybe, for the data file we could just store
the schema as the string, deflate compressed.  That might be computationally more expensive
for a compact schema representation, but it could be clean in general -- if the first character
in a byte[] that represents a schema is a special marker value (that is invalid in JSON),
then the remaining bytes are compressed json, otherwise its utf-8 json.

My largest schema is 6.3k as a string including whitespace 'pretty printed', and 4.9k without
whitespace as printed by Schema.toString().
It is 1.3k compressed by gzip -5 or higher,  and 1.5k by gzip -1.

> add schema for schemas
> ----------------------
>                 Key: AVRO-251
>                 URL: https://issues.apache.org/jira/browse/AVRO-251
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-251.patch, AVRO-251.patch
> A schema for schemas would permits schemas to be written in binary.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message