From issues-return-198958-archive-asf-public=cust-asf.ponee.io@hive.apache.org Sat Sep 12 21:16:15 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id DD65E180660 for ; Sat, 12 Sep 2020 23:16:13 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id C8966419E5 for ; Sat, 12 Sep 2020 21:16:04 +0000 (UTC) Received: (qmail 34921 invoked by uid 500); 12 Sep 2020 21:16:01 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 34908 invoked by uid 99); 12 Sep 2020 21:16:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Sep 2020 21:16:01 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8D484418EE for ; Sat, 12 Sep 2020 21:16:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 0C14B78011B for ; Sat, 12 Sep 2020 21:16:00 +0000 (UTC) Date: Sat, 12 Sep 2020 21:16:00 +0000 (UTC) From: "ASF GitHub Bot (Jira)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=3D= 483125&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpa= nel#worklog-483125 ] ASF GitHub Bot logged work on HIVE-24151: ----------------------------------------- Author: ASF GitHub Bot Created on: 12/Sep/20 21:15 Start Date: 12/Sep/20 21:15 Worklog Time Spent: 10m=20 Work Description: szlta opened a new pull request #1490: URL: https://github.com/apache/hive/pull/1490 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 483125) Time Spent: 1h 20m (was: 1h 10m) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > --------------------------------------------------------------------- > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug > Reporter: =C3=81d=C3=A1m Szita > Assignee: =C3=81d=C3=A1m Szita > Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL l= ast columns) but introduced a regression: the approach of the fix is pretty= much all wrong, as the existing logic that operated on bytes got replaced = by regex matcher logic which deals in character positions, rather than byte= positions. As some non ASCII characters consist of more than 1 byte, the w= hole record may get shifted due to this. > With this=C2=A0ticket I'm going to restore the old logic, and apply the p= roper fix on that, but keeping (and extending) the test cases added with HI= VE-22360 so that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)