flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-7243) Add ParquetInputFormat
Date Tue, 11 Sep 2018 09:33:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610343#comment-16610343
] 

ASF GitHub Bot commented on FLINK-7243:
---------------------------------------

lvhuyen commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-420210267
 
 
   Thank you @HuangZhenQiu.
   
   I created a [(quick patch),](https://github.com/lvhuyen/flink/tree/parquet_input_format(7243))
for that Timestamp conversion and my Flink job has been running well with ParquetPojoInputFormat.
   
   One more issue I found in the current build is with the parquet file having a column of
type Array of primitive. As per parquet format spec, schema for this column will have 2 layers
of GroupType, while in the method _ParquetSchemaConverter.convertField(final Type fieldType)_,
only one is handled. So, in the result, we'll not get an array of primitive types, but one
ObjectArray which has only 1 item.
   This issue is with ParquetMapInputFormat only, so I have not done anything with it in my
branch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add ParquetInputFormat
> ----------------------
>
>                 Key: FLINK-7243
>                 URL: https://issues.apache.org/jira/browse/FLINK-7243
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API &amp; SQL
>            Reporter: godfrey he
>            Assignee: Zhenqiu Huang
>            Priority: Major
>              Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message