orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ORC-362) String direct length streams gets some values even if data is null
Date Wed, 09 May 2018 20:51:00 GMT
Prasanth Jayachandran created ORC-362:
-----------------------------------------

             Summary: String direct length streams gets some values even if data is null
                 Key: ORC-362
                 URL: https://issues.apache.org/jira/browse/ORC-362
             Project: ORC
          Issue Type: Bug
    Affects Versions: 1.4.3
            Reporter: Prasanth Jayachandran


Observed this in one of the orc files recently.

Looking at the orcfiledump (compression is NONE) something looks odd
{code}
    Stream: column 2 section PRESENT start: 13976 length 80
    Stream: column 2 section DATA start: 14056 length 541
    Stream: column 2 section LENGTH start: 14597 length 13
..
..
..
    Row group indices for column 2:
      Entry 0: count: 4 hasNull: true min: Date Record First Seen at LOGSA max: Unit Identification
Code Assigned to this DoDAAC sum: 157 positions: 0,0,0,0,0,0
      Entry 1: count: 5 hasNull: true min: The equipment-type-id of a specific WEAPON-TYPE
(a role name for object-type-id). max: This column should always be blank. sum: 314 positions:
26,111,0,157,0,4
      Entry 2: count: 2 hasNull: true min: This column should always be blank. max: This column
should always be blank. sum: 70 positions: 52,62,0,471,0,9
      Entry 3: count: 0 hasNull: true positions: 78,16,0,541,0,11
{code}

If we look at Entry 3 (last entry) and related the stream positions, last entry is all nulls,
the corresponding data stream ended at 541 offset (which is same as length). Data stream looks
correct. But now if we look at length stream, the position is recorded as 11 in last entry
but the length is actually 13 (this last 2 bytes is not expected). If there is no data the
length stream is supposedly not record anything. If the data is null, only isPresent stream
is expected to have an entry. Looks like orc writer is writing entries to length stream even
if data is null (probably recording 0 lengths). 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message