tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-30) Parquet Integration
Date Wed, 06 Nov 2013 03:21:17 GMT

    [ https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814566#comment-13814566

David Chen commented on TAJO-30:

Hi all,

I am new to the Tajo project. I am very excited about Tajo's capabilities and am interested
in contributing to the project.

I am one of the main engineers working on deploying Parquet at LinkedIn and have made a number
of contributions to the Parquet project such as adding support for the FIXED_LEN_BYTE_ARRAY
data type and a number of Avro support improvements.

I am excited to see that Parquet support is planned for Tajo as well. Due to the Parquet's
generic design, adding Tajo integration would mostly involve writing a FileReader, FileWriter,
and a SchemaConverter so that Tajo can automatically convert the schema and records to Tajo's
internal representation on the read side and then vice versa on the right side. This is the
approach that most of the packages under parquet-mr take, such as parquet-avro, parquet-thrift,

Min, have you started working on Parquet support for Tajo? If not, would it be fine if I take
this ticket?


> Parquet Integration
> -------------------
>                 Key: TAJO-30
>                 URL: https://issues.apache.org/jira/browse/TAJO-30
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>            Assignee: Dongmin Yu
>              Labels: Parquet
> Parquet is very promising file format developed by twitter. We need to investigate the
applicability of Parquet. If possible, we implement Parquet port.
> http://parquet.io/

This message was sent by Atlassian JIRA

View raw message