tajo-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-711) Add Avro storage support
Date Wed, 16 Apr 2014 05:08:15 GMT

    [ https://issues.apache.org/jira/browse/TAJO-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970429#comment-13970429

Hyunsik Choi commented on TAJO-711:

This is my comment for the concept of schema evolving table.

Few days ago, I discussed your idea with Hyoungjun in offline. We were very happy to see your
interesting idea. I got some additional suggestion from Hyoungjun, and I add my some concrete
ideas to them.

I'd like to give some assumption and define some terms before I discuss the idea.

 * A partitioned table has a schema.
  ** Let us call this schema 'parent schema'.
 * Each partition has its own schema.
  ** Let us call this schema 'partition schema'.
 * Let us call this kind of table 'a schema-evolving table'.

 (I know that my naming sense is not good. They are temporary names. I hope that some guys
suggest better names.)

The rough idea is as follows:

 * Even though a schema is actually an ordered set of fields, we see the schema is just a
set of fields when we deals with the relationship between parent schema and partition schemas.
 * The schema of a schema evolving table must be a super set of all fields in partition schemas.
 * The field set in each schema must be a subset of the parent schema.
 * The same name fields in all partition schemas including the parent schema must be the same
data types.
 * The partition schemas among partitions can be different one another.
 * The order of schema fields among partitions can be different. (It's because we just see
the fields as a set.)
 * Newly added fields of new partitions are added to the tail of the parent schema.
   ** The schema maintenance will be performed when 'ALTER TABLE ADD PARTITION' is executed.

In planning phases, Tajo will use only the parent schema, and then it will rewrites some projection
plan for each partition if needed. When there is no corresponding field required in a query
in a certain partition, the field will be NULL value in the processing on the partition.

> Add Avro storage support
> ------------------------
>                 Key: TAJO-711
>                 URL: https://issues.apache.org/jira/browse/TAJO-711
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: David Chen
>            Assignee: David Chen
>         Attachments: TAJO-711.patch, TAJO-711.patch, TAJO-711_140415_rebased.patch, TAJO-711_20140413_20:36:40.patch,
TAJO-711_20140413_21:00:34.patch, TAJO-711_20140413_21:46:27.patch, TAJO-711_20140414_11:07:13.patch,
> Add {{FileScanner}} and {{FileAppender}} for reading from and writing to Avro.

This message was sent by Atlassian JIRA

View raw message