hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Biswajit Nayak <biswa...@altiscale.com>
Subject Re: Hive Cli ORC table read error with limit option
Date Tue, 19 Apr 2016 02:14:44 GMT
Thanks Prasanth for the update. I will test it and update it here the
outcome.

Thanks
Biswa

On Tue, Apr 19, 2016 at 6:26 AM, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> Hi Biswajit
>
> You might need patch from https://issues.apache.org/jira/browse/HIVE-11546
>
> Can you apply this patch to your hive build and see if it solves the
> issue? (recommended)
>
> Alternatively, you can use “hive.exec.orc.split.strategy”=“BI” as
> workaround.
> Its highly not recommended to use this config as it will disable split
> elimination
> and may generate sub-optiomal splits resulting in less map-side
> parallelism.
> This config is just provided as an workaround and is suitable when all orc
> files
> are small (<less than stripe size or block size).
>
> Thanks
> Prasanth
>
>
> On Apr 18, 2016, at 7:44 PM, Biswajit Nayak <biswajit@altiscale.com>
> wrote:
>
> Hi All,
>
> I seriously need help on this aspect. Any reference or pointer to
> troubleshoot or fix this, could be helpful.
>
> Regards
> Biswa
>
> On Fri, Mar 25, 2016 at 11:24 PM, Biswajit Nayak <biswajit@altiscale.com>
> wrote:
>
>> Prashanth,
>>
>> Apologies for the delay in response.
>>
>> Below is the orcfiledump of the empty orc file from a broken partition.
>>
>> *$ hive --orcfiledump /hive/*testdb*.db/*table_orc
>> */year=2016/month=1/day=29/000000_0*
>> *Structure for  /hive/*testdb*.db/*table_orc
>> */year=2016/month=1/day=29/000000_0*
>> *File Version: 0.12 with HIVE_8732*
>> *16/03/25 17:49:09 INFO orc.ReaderImpl: Reading ORC rows from  /hive/*
>> testdb*.db/*table_orc*/year=2016/month=1/day=29/000000_0 with {include:
>> null, offset: 0, length: 9223372036854775807}*
>> *16/03/25 17:49:09 INFO orc.RecordReaderFactory: Schema is not specified
>> on read. Using file schema.*
>> *Rows: 0*
>> *Compression: SNAPPY*
>> *Compression size: 262144*
>> *Type: struct<>*
>>
>> *Stripe Statistics:*
>>
>> *File Statistics:*
>> *  Column 0: count: 0 hasNull: false*
>>
>> *Stripes:*
>>
>> *File length: 49 bytes*
>> *Padding length: 0 bytes*
>> *Padding ratio: 0%*
>> *$ *
>>
>>
>> I still not able to figure it out whats causing this odd behaviour?
>>
>>
>> Regards
>> Biswa
>>
>> On Thu, Mar 10, 2016 at 3:12 PM, Prasanth Jayachandran <
>> pjayachandran@hortonworks.com> wrote:
>>
>>> Alternatively you can send orcfiledump output for the empty orc file
>>> from broken partition.
>>>
>>> Thanks
>>> Prasanth
>>>
>>> On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran <
>>> pjayachandran@hortonworks.com> wrote:
>>>
>>> Could you attach the emtpy orc files from one of the broken partition
>>> somewhere? I can run some tests on it to see why its happening.
>>>
>>> Thanks
>>> Prasanth
>>>
>>> On Mar 8, 2016, at 12:02 AM, Biswajit Nayak <biswajit@altiscale.com>
>>> wrote:
>>>
>>> Both the parameters are set to false by default.
>>>
>>> *hive> set hive.optimize.index.filter;*
>>> *hive.optimize.index.filter=false*
>>> *hive> set hive.orc.splits.include.file.footer;*
>>> *hive.orc.splits.include.file.footer=false*
>>> *hive> *
>>>
>>> >>>I suspect this might be related to having 0 row files in the buckets
>>> not
>>> having any recorded schema.
>>>
>>> yes there are few files with 0 row, but the query works with other
>>> partition (which has 0 row files). Out of 30 partition (for a month), 3-4
>>> partition are having this issue. Even reload of the data does not yield
>>> anything. Query works fine in MR now, but having issue in tez.
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan <gopalv@apache.org>
>>> wrote:
>>>
>>>>
>>>> > c                varchar(2)
>>>> ...
>>>> > Num Buckets:         7
>>>>
>>>> I suspect this might be related to having 0 row files in the buckets not
>>>> having any recorded schema.
>>>>
>>>> You can also experiment with hive.optimize.index.filter=false, to see if
>>>> the zero row case is artificially produced via predicate push-down.
>>>>
>>>>
>>>> That shouldn't be a problem unless you've turned on
>>>> hive.orc.splits.include.file.footer=true (recommended to be false).
>>>>
>>>> Your row-locations don't actually match any Apache source jar in my
>>>> builds, are there any other patches to consider?
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>

Mime
View raw message