kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaofeng SHI (JIRA)" <>
Subject [jira] [Commented] (KYLIN-3070) Enable 'kylin.source.hive.flat-table-storage-format' for flat table storage format
Date Thu, 24 May 2018 12:03:00 GMT


Shaofeng SHI commented on KYLIN-3070:

[~seva_ostapenko] Hello Vsevolod, I'm wondering, after switch from sequence file to parquet
file as the formate for intermediate table, did you observe a performance improvement? As
Kylin's processing for the data is row by row, so I guess changing to Parquet may not benefit;
while the column compression may downgrade the performance. Just want to see if you have
such information.

> Enable 'kylin.source.hive.flat-table-storage-format' for flat table storage format
> ----------------------------------------------------------------------------------
>                 Key: KYLIN-3070
>                 URL:
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v2.2.0
>         Environment: HDP 2.5.6, Kylin 2.2.0
>            Reporter: Vsevolod Ostapenko
>            Assignee: Vsevolod Ostapenko
>            Priority: Major
>              Labels: newbie
>             Fix For: v2.3.0
>         Attachments: KYLIN-3070.master.001.patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> Flat table storage format is currently hard-coded as SEQUENCEFILE in the core-job/src/main/java/org/apache/kylin/job/
> That prevents using Impala as a SQL engine while using beeline CLI (via custom JDBC URL),
as Impala cannot write sequence files.
> Adding a parameter to to override the default setting would address
the issue.
> Removing a hard-coded value for storage format might be good idea in and on itself.

This message was sent by Atlassian JIRA

View raw message