orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: getting read past EOF for Double column
Date Wed, 27 Dec 2017 17:13:01 GMT
I've filed this as https://issues.apache.org/jira/browse/ORC-285 . Sorry
for the delay in getting the fix out.

.. Owen

On Mon, Dec 18, 2017 at 10:27 AM, Owen O'Malley <owen.omalley@gmail.com>
wrote:

> This is a bug. Please file a jira. It looks like a change went in that
> made the DoubleTreeReader fail if it is called on a batch of size 0.
>
> Thanks,
>    Owen
>
> On Mon, Dec 18, 2017 at 10:19 AM, Owen O'Malley <owen.omalley@gmail.com>
> wrote:
>
>> Actually, the metadata is reasonable, it is just that there is an array
>> above that column that doesn't have any elements.
>>
>> So the tree down to column 36 looks like:
>>
>> column 0: (struct) count: 42692
>> column 1: data (struct) count: 42692
>> column 21: listingAssociated (array) count: 42692
>> column 22: (struct) count: 0
>> column 32: sla (array) count: 0
>> column 33: (struct) count: 0
>> column 34: shippingTier (struct) count: 0
>> column 35: charge (struct) count: 0
>> column 36: amount (double) count: 0
>>
>> since there are 0 instances of column 22, there aren't any instances
>> below that. So what should be happening is that the reader doesn't call
>> down to read the data because there are no values.
>>
>> Which version of ORC are you using to read with?
>>
>> Thanks,
>>    Owen
>>
>>
>> On Mon, Dec 18, 2017 at 5:38 AM, Piyush Mukati <piyush.mukati@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I have written one orc file with map-reduce job. But while reading the
>>> file I am getting "read past EOF for a double column".
>>> After debugging I found that we are trying to read an empty stream. I am
>>> suspecting the file meta to be corrupt.
>>>
>>> as the column meta says:
>>> *Column 36: count: 0 hasNull: false sum: 0.0*
>>> I am not able to understand how hasNull=false and count can be zero.
>>> while other columns have non zero counts.
>>>
>>> I am out of ideas on debugging.  Please help me with the direction I
>>> should debug  further.
>>> please find attached meta and the stackTarace.
>>> Thanks.
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message