From dev-return-2243-archive-asf-public=cust-asf.ponee.io@orc.apache.org Wed May 9 22:51:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 842A718067B for ; Wed, 9 May 2018 22:51:03 +0200 (CEST) Received: (qmail 70486 invoked by uid 500); 9 May 2018 20:51:02 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 70470 invoked by uid 99); 9 May 2018 20:51:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2018 20:51:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1943A1A377D for ; Wed, 9 May 2018 20:51:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -102.311 X-Spam-Level: X-Spam-Status: No, score=-102.311 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id RpDNg4q_RYu3 for ; Wed, 9 May 2018 20:51:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3409F5F27D for ; Wed, 9 May 2018 20:51:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id BBCBFE12C9 for ; Wed, 9 May 2018 20:51:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 75F24212B7 for ; Wed, 9 May 2018 20:51:00 +0000 (UTC) Date: Wed, 9 May 2018 20:51:00 +0000 (UTC) From: "Prasanth Jayachandran (JIRA)" To: dev@orc.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ORC-362) String direct length streams gets some values even if data is null MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Prasanth Jayachandran created ORC-362: ----------------------------------------- Summary: String direct length streams gets some values even if data is null Key: ORC-362 URL: https://issues.apache.org/jira/browse/ORC-362 Project: ORC Issue Type: Bug Affects Versions: 1.4.3 Reporter: Prasanth Jayachandran Observed this in one of the orc files recently. Looking at the orcfiledump (compression is NONE) something looks odd {code} Stream: column 2 section PRESENT start: 13976 length 80 Stream: column 2 section DATA start: 14056 length 541 Stream: column 2 section LENGTH start: 14597 length 13 .. .. .. Row group indices for column 2: Entry 0: count: 4 hasNull: true min: Date Record First Seen at LOGSA max: Unit Identification Code Assigned to this DoDAAC sum: 157 positions: 0,0,0,0,0,0 Entry 1: count: 5 hasNull: true min: The equipment-type-id of a specific WEAPON-TYPE (a role name for object-type-id). max: This column should always be blank. sum: 314 positions: 26,111,0,157,0,4 Entry 2: count: 2 hasNull: true min: This column should always be blank. max: This column should always be blank. sum: 70 positions: 52,62,0,471,0,9 Entry 3: count: 0 hasNull: true positions: 78,16,0,541,0,11 {code} If we look at Entry 3 (last entry) and related the stream positions, last entry is all nulls, the corresponding data stream ended at 541 offset (which is same as length). Data stream looks correct. But now if we look at length stream, the position is recorded as 11 in last entry but the length is actually 13 (this last 2 bytes is not expected). If there is no data the length stream is supposedly not record anything. If the data is null, only isPresent stream is expected to have an entry. Looks like orc writer is writing entries to length stream even if data is null (probably recording 0 lengths). -- This message was sent by Atlassian JIRA (v7.6.3#76005)