hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Biswajit Nayak <biswa...@altiscale.com>
Subject Re: Hive Cli ORC table read error with limit option
Date Fri, 25 Mar 2016 17:54:52 GMT
Prashanth,

Apologies for the delay in response.

Below is the orcfiledump of the empty orc file from a broken partition.

*$ hive --orcfiledump /hive/*testdb*.db/*table_orc
*/year=2016/month=1/day=29/000000_0*

*Structure for  /hive/*testdb*.db/*table_orc
*/year=2016/month=1/day=29/000000_0*

*File Version: 0.12 with HIVE_8732*

*16/03/25 17:49:09 INFO orc.ReaderImpl: Reading ORC rows from  /hive/*testdb
*.db/*table_orc*/year=2016/month=1/day=29/000000_0 with {include: null,
offset: 0, length: 9223372036854775807}*

*16/03/25 17:49:09 INFO orc.RecordReaderFactory: Schema is not specified on
read. Using file schema.*

*Rows: 0*

*Compression: SNAPPY*

*Compression size: 262144*

*Type: struct<>*


*Stripe Statistics:*


*File Statistics:*

*  Column 0: count: 0 hasNull: false*


*Stripes:*


*File length: 49 bytes*

*Padding length: 0 bytes*

*Padding ratio: 0%*

*$ *


I still not able to figure it out whats causing this odd behaviour?


Regards
Biswa

On Thu, Mar 10, 2016 at 3:12 PM, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> Alternatively you can send orcfiledump output for the empty orc file from
> broken partition.
>
> Thanks
> Prasanth
>
> On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran <
> pjayachandran@hortonworks.com> wrote:
>
> Could you attach the emtpy orc files from one of the broken partition
> somewhere? I can run some tests on it to see why its happening.
>
> Thanks
> Prasanth
>
> On Mar 8, 2016, at 12:02 AM, Biswajit Nayak <biswajit@altiscale.com>
> wrote:
>
> Both the parameters are set to false by default.
>
> *hive> set hive.optimize.index.filter;*
> *hive.optimize.index.filter=false*
> *hive> set hive.orc.splits.include.file.footer;*
> *hive.orc.splits.include.file.footer=false*
> *hive> *
>
> >>>I suspect this might be related to having 0 row files in the buckets
> not
> having any recorded schema.
>
> yes there are few files with 0 row, but the query works with other
> partition (which has 0 row files). Out of 30 partition (for a month), 3-4
> partition are having this issue. Even reload of the data does not yield
> anything. Query works fine in MR now, but having issue in tez.
>
>
>
> On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan <gopalv@apache.org>
> wrote:
>
>>
>> > c                varchar(2)
>> ...
>> > Num Buckets:         7
>>
>> I suspect this might be related to having 0 row files in the buckets not
>> having any recorded schema.
>>
>> You can also experiment with hive.optimize.index.filter=false, to see if
>> the zero row case is artificially produced via predicate push-down.
>>
>>
>> That shouldn't be a problem unless you've turned on
>> hive.orc.splits.include.file.footer=true (recommended to be false).
>>
>> Your row-locations don't actually match any Apache source jar in my
>> builds, are there any other patches to consider?
>>
>> Cheers,
>> Gopal
>>
>>
>>
>
>
>

Mime
View raw message