hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <mtus...@handybook.com>
Subject Re: Hive alter table concatenate loses data - can parquet help?
Date Tue, 15 Mar 2016 02:45:47 GMT
Thank you very much for thinking of this. I do not have such files. I will
file a bug as per your suggestion.

On Monday, March 14, 2016, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> Hi Marcin
>
> I came across this issue recently. Do you have old orc files (created with
> hive 0.11) in the table/partition? If so this patch is required
>
> https://issues.apache.org/jira/browse/HIVE-13285
>
> Thanks
> Prasanth
>
> On Mar 10, 2016, at 5:02 PM, Prasanth Jayachandran <
> pjayachandran@hortonworks.com
> <javascript:_e(%7B%7D,'cvml','pjayachandran@hortonworks.com');>> wrote:
>
> After hive 1.2.1 there is one patch that went in related to alter table
> concatenation. https://issues.apache.org/jira/browse/HIVE-12450
>
> I am not sure if its related though. Could you please file a bug for this?
> It will be great if you can attach a small enough repro for this issue. I
> can verify it and provide a fix in case of bug.
>
> Thanks
> Prasanth
>
> On Mar 8, 2016, at 5:52 AM, Marcin Tustin <mtustin@handybook.com
> <javascript:_e(%7B%7D,'cvml','mtustin@handybook.com');>> wrote:
>
> Hi Mich,
>
> ddl as below.
>
> Hi Prasanth,
>
> Hive version as reported by Hortonworks is 1.2.1.2.3.
>
> Thanks,
> Marcin
>
> CREATE TABLE `<tablename>`(
>
>   `col1` string,
>
>   `col2` bigint,
>
>   `col3` string,
>
>   `col4` string,
>
>   `col4` string,
>
>   `col5` bigint,
>
>   `col6` string,
>
>   `col7` string,
>
>   `col8` string,
>
>   `col9` string,
>
>   `col10` boolean,
>
>   `col11` boolean,
>
>   `col12` string,
>
>   `metadata`
> struct<file:string,hostname:string,level:string,line:bigint,logger:string,method:string,millis:bigint,pid:bigint,timestamp:string>,
>
>   `col14` string,
>
>   `col15` bigint,
>
>   `col16` double,
>
>   `col17` bigint)
>
> ROW FORMAT SERDE
>
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>
> STORED AS INPUTFORMAT
>
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>
> OUTPUTFORMAT
>
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
>
> LOCATION
>
>   'hdfs://reporting-handy/<path>'
>
> TBLPROPERTIES (
>
>   'COLUMN_STATS_ACCURATE'='true',
>
>   'numFiles'='2800',
>
>   'numRows'='297263',
>
>   'rawDataSize'='454748401',
>
>   'totalSize'='31310353',
>
>   'transient_lastDdlTime'='1457437204')
>
> Time taken: 1.062 seconds, Fetched: 34 row(s)
>
> On Tue, Mar 8, 2016 at 4:29 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>> wrote:
>
>> Hi
>>
>> can you please provide DDL for this table "show create table <TABLE>"
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 7 March 2016 at 23:25, Marcin Tustin <mtustin@handybook.com
>> <javascript:_e(%7B%7D,'cvml','mtustin@handybook.com');>> wrote:
>>
>>> Hi All,
>>>
>>> Following on from from our parquet vs orc discussion, today I observed
>>> hive's alter table ... concatenate command remove rows from an ORC
>>> formatted table.
>>>
>>> 1. Has anyone else observed this (fuller description below)? And
>>> 2. How to do parquet users handle the file fragmentation issue?
>>>
>>> Description of the problem:
>>>
>>> Today I ran a query to count rows by date. Relevant days below:
>>> 2016-02-28 16866
>>> 2016-03-06 219
>>> 2016-03-07 2863
>>> I then ran concatenation on that table. Rerunning the same query
>>> resulted in:
>>>
>>> 2016-02-28 16866
>>> 2016-03-06 219
>>> 2016-03-07 1158
>>>
>>> Note reduced count for 2016-03-07
>>>
>>> I then ran concatenation a second time, and the query a third time:
>>> 2016-02-28 16344
>>> 2016-03-06 219
>>> 2016-03-07 1158
>>>
>>> Now the count for 2016-02-28 is reduced.
>>>
>>> This doesn't look like an elimination of duplicates occurring by design
>>> - these didn't all happen on the first run of concatenation. It looks like
>>> concatenation just kind of loses data.
>>>
>>>
>>>
>>> Want to work at Handy? Check out our culture deck and open roles
>>> <http://www.handy.com/careers>
>>> Latest news <http://www.handy.com/press> at Handy
>>> Handy just raised $50m
>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>>> by Fidelity
>>>
>>>
>>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
> by Fidelity
>
>
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led 
by Fidelity


Mime
View raw message