hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Li (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
Date Tue, 19 Jul 2016 05:21:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383593#comment-15383593
] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:20 AM:
--------------------------------------------------------

I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even better to deprecate
the buffered row mode completely due to OOM issue. I don't think this is a breaking change
since it does not affect the query result. I am not sure about the correct behavior with "--incremental=false"
though (maybe we have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass (e.g. IncrementalRowsWithNormalization).
The reason is that the non-table formats don't require column width normalization at all so
it's better to isolate the normalization related code from these formats. Without any code
change (other than setting default of incremental to true), the non-table formats should just
work fine. Only the table format will involve the normalization code path (e.g. your incremental
normalization code).


was (Author: taoli-hwx):
~Sahil Takiar I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even better to deprecate
the buffered row mode completely due to OOM issue. I don't think this is a breaking change
since it does not affect the query result. I am not sure about the correct behavior with "--incremental=false"
though (maybe we have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass (e.g. IncrementalRowsWithNormalization).
The reason is that the non-table formats don't require column width normalization at all so
it's better to isolate the normalization related code from these formats. Without any code
change (other than setting default of incremental to true), the non-table formats should just
work fine. Only the table format will involve the normalization code path (e.g. your incremental
normalization code).

> Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat
is used
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14170
>                 URL: https://issues.apache.org/jira/browse/HIVE-14170
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Beeline
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed out immediately.
However, if {{TableOutputFormat}} is used with this option the formatting can look really
off.
> The reason is that {{IncrementalRows}} does not do a global calculation of the optimal
width size for {{TableOutputFormat}} (it can't because it only sees one row at a time). The
output of {{BufferedRows}} looks much better because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width should be
re-calculated every "x" rows ("x" can be configurable and by default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message